Introduction to TensorFlow, Deep Learning and Transfer Learning (work in progress)¶
- Project: Dog Vision 🐶👁 - Using computer vision to classify dog photos into different breeds.
- Goals: Learn TensorFlow, deep learning and transfer learning.
- Domain: Computer vision.
- Data: Images of dogs from Stanford Dogs Dataset (120 dog breeds, 20,000+ images).
- Problem type: Multi-class classification (120 different classes).
Welcome, welcome!
The focus of this notebook is to give a quick overview of deep learning with TensorFlow.
How?
We're going to go through the machine learning workflow steps and build a computer vision project to classify photos of dogs into their respective dog breed.
TK - image of workflow - e.g. dog photo -> model -> dog breed
# Quick timestamp
import datetime
print(f"Last updated: {datetime.datetime.now()}")
Last updated: 2024-01-26 00:25:40.650691
TK - What we're going to cover¶
In this project, we're going to be introduced to the power of deep learning and more specifically, transfer learning using TensorFlow.
We'll go through each of these in the context of the 6 step machine learning framework:
- Problem defintion - Use computer vision to classify photos of dogs into different dog breeds.
- Data - 20,000+ images of dogs from 120 different dog breeds from the Stanford Dogs dataset.
- Evaluation - We'd like to beat the original paper's results (22% mean accuracy across all classes, tip: A good way to practice your skills is to find some results online and try to beat them).
- Features - Because we're using deep learning, our model will learn the features on its own.
- Modelling - We're going to use a pretrained convolutional neural network (CNN) and transfer learning.
- Experiments - We'll try different amounts of data with the same model to see the effects on our results.
Note: It's okay not to know these exact steps ahead of time. When starting a new project, it's often the case you'll figure it out as you go. These steps are only filled out because I've had practice working on several machine learning projects. You'll pick up these ideas overtime.
TK - Table of contents¶
- Problem type (e.g. multi-class classification)
- Domain (e.g. computer vision)
- Data type (e.g. unstructured vs structured)
TK - Where can can you get help?¶
All of the materials for this course live on GitHub.
If you run into trouble, you can ask a question on the course GitHub Discussions page there too.
Quick definitions¶
Let's start by breaking down some of the most important topics we're going to go through.
TK - What is TensorFlow?¶
TensorFlow is an open source machine learning and deep learning framework originally developed by Google.
TK - Why use TensorFlow?¶
TensorFlow allows you to manipulate data and write deep learning algorithms using Python code.
It also has several built-in capabilities to leverage accelerated computing hardware (e.g. GPUs, Graphics Processing Units and TPUs, Tensor Processing Units).
Many of world's largest companies power their machine learning workloads with TensorFlow.
TK - What is deep learning?¶
Deep learning is a form of machine learning where data passes through a series of progressive layers which all contribute to learning an overall representation of that data.
The series of progressive layers combines to form what's referred to as a neural network.
For example, a photo may be turned into numbers and those numbers are then manipulated mathematically through each progressive layer to learn patterns in the photo.
The "deep" in deep learning comes from the number of layers used in the neural network.
So when someone says deep learning or (artificial neural networks), they're typically referring to same thing.
TK - What can deep learning be used for?¶
Deep learning is such a powerful technique that new use cases are being discovered everyday.
Most of the modern forms of artifical intelligence (AI) applications you see, are powered by deep learning.
ChatGPT uses deep learning to process text and return a response.
Tesla's self-driving cars use deep learning to power their computer vision systems.
Apple's Photos app uses deep learning to recognize faces in images and create Photo Memories.
Nutrify (an app my brother and I build) uses deep learning to recognize food in images.
TK - image of examples
TK - What is transfer learning?¶
Transfer learning is one of the most powerful and useful techniques in modern AI and machine learning.
It involves taking what one model (or neural network) has learned in a similar domain and applying to your own.
In our case, we're going to use transfer learning to take the patterns a neural network has learned from the 1 million+ images and over 1000 classes in ImageNet (a gold standard computer vision benchmark) and apply them to our own problem of recognizing dog breeds.
The biggest benefit of transfer learning is that it often allows you to get outstanding results with less data and time.
TK - Transfer learning workflow - Large data -> Large model -> Patterns -> Custom data -> Custom model
TK - Getting setup¶
This section of the course is taught with Google Colab, an online Jupyter Notebook that provides free access to GPUs (Graphics Processing Units, we'll hear more on these later).
For a quick rundown on how to use Google Colab, see their introductory guide (it's quite similar to a Jupyter Notebook with a few different options).
Google Colab also comes with many data science and machine learning libraries, including TensorFlow, pre-installed.
Getting a GPU¶
Before running any code, we'll make sure our Google Colab instance is connected to a GPU.
You can do this via going to Runtime -> Change runtime type -> GPU (this may restart your existing runtime).
Why use a GPU?
Since neural networks perform a large amount of calculations behind the scenes (the main one being matrix multiplication), you need a computer chip that perform these calculations quickly, otherwise you'll be waiting all day for a model to train.
And in short, GPUs are much faster at performing matrix multiplications than CPUs.
Why this is the case is behind the scope of this project (you can search "why are GPUs faster than CPUs for machine learning?" for more).
The main thing to remember is: generally, in deep learning, GPUs = faster than CPUs.
Note: A good experiment would be to run the neural networks we're going to build later on with and without a GPU and see the difference in their training times.
Ok, enough talking, let's start by importing TensorFlow!
We'll do so using the common abbreviation tf.
# TK - TODO: Check compatibility with Keras 3.0 by installing tf-nightly, see: https://x.com/fchollet/status/1719448117064659352?s=20
import tensorflow as tf
tf.__version__
'2.15.0'
Nice!
Note: If you want to run TensorFlow locally, you can follow the TensorFlow installation guide.
Now let's check to see if TensorFlow has access to a GPU (this isn't 100% required to complete this project but will speed things up dramatically).
We can do so with the method tf.config.list_physical_devices().
# Do we have access to a GPU?
device_list = tf.config.list_physical_devices()
if "GPU" in [device.device_type for device in device_list]:
print(f"[INFO] TensorFlow has GPU available to use. Woohoo!! Computing will be sped up!")
print(f"[INFO] Accessible devices:\n{device_list}")
else:
print(f"[INFO] TensorFlow does not have GPU available to use. Models may take a while to train.")
print(f"[INFO] Accessible devices:\n{device_list}")
[INFO] TensorFlow has GPU available to use. Woohoo!! Computing will be sped up! [INFO] Accessible devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TK - Getting Data¶
There are several options and locations to get data for a deep learning project.
| Resource | Description |
|---|---|
| Kaggle Datasets | A collection of datasets across a wide range of topics. |
| TensorFlow Datasets | A collection of ready-to-use machine learning datasets ready for use under the tf.data.Datasets API. You can see a list of all available datasets in the TensorFlow documentation. |
| Hugging Face Datasets | A continually growing resource of datasets broken into several different kinds of topics. |
| Google Dataset Search | A search engine by Google specifically focused on searching online datasets. |
| Original sources | Datasets which are made available by researchers or companies with the release of a product or research paper (sources for these will vary, they could be a link on a website or a link to an application form). |
| Custom datasets | These are datasets comprised of your own custom source of data. You may build these from scratch on your own or have access to them from an existing product or service. For example, your entire photos library could be your own custom dataset or your entire notes and documents folder or your company's custom order history. |
In our case, the dataset we're going to use is called the Stanford Dogs dataset (or ImageNet dogs, as the images are dogs separated from ImageNet).
Because the Stanford Dogs dataset has been around for a while (since 2011, which as of writing this in 2023 is like a lifetime in deep learning), it's available from several resources:
- The original project website via link download
- Inside TensorFlow datasets under
stanford_dogs - On Kaggle as a downloadable dataset
The point here is that when you're starting out with practicing deep learning projects, there's no shortage of datasets available.
However, when you start wanting to work on your own projects or within a company environment, you'll likely start to work on custom datasets (datasets you build yourself or aren't available publicly online).
The main difference between existing datasets and custom datasets is that existing datasets often come preformatted and ready to use.
Where as custom datasets often require some preprocessing before they're ready to use within a machine learning project.
To practice formatting a dataset for a machine learning problem, we're going to download the Stanford Dogs dataset from the original website.
Before we do so, the following code is an example of how we'd get the Stanford Dogs dataset from TensorFlow Datasets.
# Download the dataset into train and test split using TensorFlow Datasets
# import tensorflow_datasets as tfds
# ds_train, ds_test = tfds.load('stanford_dogs', split=['train', 'test'])
TK - Download data directly from Stanford Dogs website¶
Our overall project goal is to build a computer vision model which performs better than the original Stanford Dogs paper (average of 22% accuracy per class across 120 classes).
To do so, we need some data.
Let's download the original Stanford Dogs dataset from the project website.
The data comes in three main files:
- Images (757MB) -
images.tar - Annotations (21MB) -
annotation.tar - Lists, with train/test splits (0.5MB) -
lists.tar
Our goal is to get a file structure like this:
dog_vision_data/
images.tar
annotation.tar
lists.tar
Note: If you're using Google Colab for this project, remember that any data uploaded to the Google Colab session gets deleted if the session disconnects. So to save us redownloading the data every time, we're going to download it once and save it to Google Drive.
Resource: For a good guide on getting data in and out of Google Colab, see the Google Colab
io.ipynbtutorial.
To make sure we don't have to keep redownloading the data every time we leave and come back to Google Colab, we're going to:
- Download the data if it doesn't already exist on Google Drive.
- Copy it to Google Drive (because Google Colab connects nicely with Google Drive) if it isn't already there.
- If the data already exists on Google Drive (we've been through steps 1 & 2), we'll import it instead.
There are two main options to connect Google Colab instances to Google Drive:
- Click "Mount Drive" in "Files" menu on the left.
- Mount programmatically with
from google.colab import drive->drive.mount('/content/drive').
More specifically, we're going to follow the following steps:
- Mount Google Drive.
- Setup constants such as our base directory to save files to, the target files we'd like to download and target URL we'd like to download from.
- Setup our target local path to save to.
- Check if the target files all exist in Google Drive and if they do, copy them locally.
- If the target files don't exist in Google Drive, download them from the target URL with the
!wgetcommand. - Create a file on Google Drive to store the download files.
- Copy the downloaded files to Google Drive for use later if needed.
A fair few steps, but nothing we can't handle!
Plus, this is all good practice for dealing with and manipulating data, a very important skill in the machine learning engineers toolbox.
from pathlib import Path
from google.colab import drive
# 1. Mount Google Drive (this will bring up a pop-up to sign-in/authenticate)
# Note: This step is specifically for Google Colab, if you're working locally, you may need a different setup
drive.mount("/content/drive")
# 2. Setup constants
# Note: For constants like this, you'll often see them created as variables with all capitals
TARGET_DRIVE_PATH = Path("drive/MyDrive/tensorflow/dog_vision_data")
TARGET_FILES = ["images.tar", "annotation.tar", "lists.tar"]
TARGET_URL = "http://vision.stanford.edu/aditya86/ImageNetDogs"
# 3. Setup local path
local_dir = Path("dog_vision_data")
# 4. Check if the target files exist in Google Drive, if so, copy them to Google Colab
if all((TARGET_DRIVE_PATH / file).is_file() for file in TARGET_FILES):
print(f"[INFO] Copying Dog Vision files from Google Drive to local directory...")
print(f"[INFO] Source dir: {TARGET_DRIVE_PATH} -> Target dir: {local_dir}")
!cp -r {TARGET_DRIVE_PATH} .
print("[INFO] Good to go!")
else:
# 5. If the files don't exist in Google Drive, download them
print(f"[INFO] Target files not found in Google Drive.")
print(f"[INFO] Downloading the target files... this shouldn't take too long...")
for file in TARGET_FILES:
# wget is short for "world wide web get", as in "get a file from the web"
# -nc or --no-clobber = don't download files that already exist locally
# -P = save the target file to a specified prefix, in our case, local_dir
!wget -nc {TARGET_URL}/{file} -P {local_dir} # the "!" means to execute the command on the command line rather than in Python
print(f"[INFO] Saving the target files to Google Drive, so they can be loaded later...")
# 6. Ensure target directory in Google Drive exists
TARGET_DRIVE_PATH.mkdir(parents=True, exist_ok=True)
# 7. Copy downloaded files to Google Drive (so we can use them later and not have to re-download them)
!cp -r {local_dir}/* {TARGET_DRIVE_PATH}/
Mounted at /content/drive [INFO] Copying Dog Vision files from Google Drive to local directory... [INFO] Source dir: drive/MyDrive/tensorflow/dog_vision_data -> Target dir: dog_vision_data [INFO] Good to go!
Data downloaded!
Nice work!
Now if we get the contents of local_dir (dog_vision_data), what do we get?
We can first make sure it exists with Path.exists() and then we can iterate through its contents with Path.iterdir() and print out the .name attribute of each file.
if local_dir.exists():
print(str(local_dir) + "/")
for item in local_dir.iterdir():
print(" ", item.name)
dog_vision_data/ images.tar annotation.tar lists.tar
Excellent! That's exactly the format we wanted.
Now you might've noticed that each file ends in .tar.
What's this?
Searching "what is .tar?", I found:
In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes.
Source: Wikipedia tar page).
Exploring a bit more, I found that the .tar format is similar to .zip, however, .zip offers compression, where as .tar mostly combines many files into one.
So how do we "untar" the files in images.tar, annotation.tar and lists.tar?
We can use the !tar command (or just tar from outside of a Jupyter Cell)!
Doing this will expand all of the files within each of the .tar archives.
We'll also use a couple of flags to help us out:
- The
-xflag tellstarto extract files from an archive. - The
-fflag specifies that the following argument is the name of the archive file. - You can combine flags by putting them together
-xf.
Let's try it out!
# Untar images
# -x = extract files from the zipped file
# -v = verbose
# -z = decompress files
# -f = tell tar which file to deal with
!tar -xf dog_vision_data/images.tar
!tar -xf dog_vision_data/annotation.tar
!tar -xf dog_vision_data/lists.tar
What new files did we get?
We can check in Google Colab by inspecting the "Files" tab on the left.
Or with Python by using os.listdir(".") where "." means "the current directory".
import os
os.listdir(".") # "." stands for "here" or "current directory"
['.config', 'file_list.mat', 'test_list.mat', 'Images', 'dog_vision_data', 'drive', 'train_list.mat', 'Annotation', 'sample_data']
Ooooh!
Looks like we've got some new files!
Specifically:
train_list.mat- a list of all the training set images.test_list.mat- a list of all the testing set images.Images/- a folder containing all of the images of dogs.Annotation/- a folder containing all of the annotations for each image.file_list.mat- a list of all the files (training and test list combined).
Our next step is to go through them and see what we've got.
Exploring the data¶
Once you've got a dataset, before building a model, it's wise to explore it for a bit to see what kind of data you're working with.
- TK - things you should do when you start with a new dataset
- visualize
- check the distributions (e.g. number of samples per class)
TK - daniel bourke tweet about abraham loss function - https://twitter.com/mrdbourke/status/1456087631641473033
Discussing our target data format¶
Since our goal is to build a computer vision model to classify dog breeds, we need a way to tell our model what breed of dog is in what image.
A common data format for a classification problem is to have samples stored in folders named after their class name.
For example:
images_split/
├── train/
│ ├── class_1/
│ │ ├── train_image1.jpg
│ │ ├── train_image2.jpg
│ │ └── ...
│ ├── class_2/
│ │ ├── train_image1.jpg
│ │ ├── train_image2.jpg
│ │ └── ...
└── test/
├── class_1/
│ ├── test_image1.jpg
│ ├── test_image2.jpg
│ └── ...
├── class_2/
│ ├── test_image1.jpg
│ ├── test_image2.jpg
...
In the case of dog images, we'd put all of the images labelled "chihuahua" in a folder called chihuahua/ (and so on for all the other classes and images).
We could split these folders so that training images go in train/chihuahua/ and testing images go in test/chihuahua/.
This is what we'll be working towards creating.
Note: This structure of folder format doesn't just work for only images, it works for text, audio and other kind of classification data too.
Exploring the file lists¶
How about we check out the train_list.mat, test_list.mat and full_list.mat files?
Searching online, for "what is a .mat file?", I found that it's a MATLAB file. Before Python became the default language for machine learning and deep learning, many models and datasets were built in MATLAB.
Then I searched, "how to open a .mat file with Python?" and found an answer on Stack Overflow saying I could use the scipy library (a scientific computing library).
The good news is, Google Colab comes with scipy preinstalled.
We can use the scipy.io.loadmat() method to open a .mat file.
import scipy
# Open lists of train and test .mat
train_list = scipy.io.loadmat("train_list.mat")
test_list = scipy.io.loadmat("test_list.mat")
file_list = scipy.io.loadmat("file_list.mat")
# Let's inspect the output and type of the train_list
train_list, type(train_list)
({'__header__': b'MATLAB 5.0 MAT-file, Platform: GLNXA64, Created on: Sun Oct 9 08:36:13 2011',
'__version__': '1.0',
'__globals__': [],
'file_list': array([[array(['n02085620-Chihuahua/n02085620_5927.jpg'], dtype='<U38')],
[array(['n02085620-Chihuahua/n02085620_4441.jpg'], dtype='<U38')],
[array(['n02085620-Chihuahua/n02085620_1502.jpg'], dtype='<U38')],
...,
[array(['n02116738-African_hunting_dog/n02116738_6754.jpg'], dtype='<U48')],
[array(['n02116738-African_hunting_dog/n02116738_9333.jpg'], dtype='<U48')],
[array(['n02116738-African_hunting_dog/n02116738_2503.jpg'], dtype='<U48')]],
dtype=object),
'annotation_list': array([[array(['n02085620-Chihuahua/n02085620_5927'], dtype='<U34')],
[array(['n02085620-Chihuahua/n02085620_4441'], dtype='<U34')],
[array(['n02085620-Chihuahua/n02085620_1502'], dtype='<U34')],
...,
[array(['n02116738-African_hunting_dog/n02116738_6754'], dtype='<U44')],
[array(['n02116738-African_hunting_dog/n02116738_9333'], dtype='<U44')],
[array(['n02116738-African_hunting_dog/n02116738_2503'], dtype='<U44')]],
dtype=object),
'labels': array([[ 1],
[ 1],
[ 1],
...,
[120],
[120],
[120]], dtype=uint8)},
dict)
Okay, looks like we get a dictionary with several fields we may be interested in.
Let's check out the keys of the dictionary.
train_list.keys()
dict_keys(['__header__', '__version__', '__globals__', 'file_list', 'annotation_list', 'labels'])
My guess is that the file_list key is what we're after, as this looks like a large array of image names (the files all end in .jpg).
How about we see how many files are in each file_list key?
# Check the length of the file_list key
print(f"Number of files in training list: {len(train_list['file_list'])}")
print(f"Number of files in testing list: {len(test_list['file_list'])}")
print(f"Number of files in full list: {len(file_list['file_list'])}")
Number of files in training list: 12000 Number of files in testing list: 8580 Number of files in full list: 20580
Beautiful! Looks like these lists contain our training and test splits and the full list has a list of all the files in the dataset.
Let's inspect the train_list['file_list'] further.
train_list['file_list']
array([[array(['n02085620-Chihuahua/n02085620_5927.jpg'], dtype='<U38')],
[array(['n02085620-Chihuahua/n02085620_4441.jpg'], dtype='<U38')],
[array(['n02085620-Chihuahua/n02085620_1502.jpg'], dtype='<U38')],
...,
[array(['n02116738-African_hunting_dog/n02116738_6754.jpg'], dtype='<U48')],
[array(['n02116738-African_hunting_dog/n02116738_9333.jpg'], dtype='<U48')],
[array(['n02116738-African_hunting_dog/n02116738_2503.jpg'], dtype='<U48')]],
dtype=object)
Looks like we've got an array of arrays.
How about we turn them into a Python list for easier handling?
We can do so by extracting each individual item via indexing and list comprehension.
Let's see what it's like to get a single file name.
# Get a single filename
train_list['file_list'][0][0][0]
'n02085620-Chihuahua/n02085620_5927.jpg'
Now let's get a Python list of all the individual file names (e.g. n02097130-giant_schnauzer/n02097130_2866.jpg) so we can use them later.
# Get a Python list of all file names for each list
train_file_list = list([item[0][0] for item in train_list["file_list"]])
test_file_list = list([item[0][0] for item in test_list["file_list"]])
full_file_list = list([item[0][0] for item in file_list["file_list"]])
len(train_file_list), len(test_file_list), len(full_file_list)
(12000, 8580, 20580)
Wonderful!
How about we view a random sample of the filenames we extracted?
Note: One of my favourite things to do whilst exploring data is to continually view random samples of it. Whether it be file names or images or text snippets. Why? You can always view the first X number of samples, however, I find that continually viewing random samples of the data gives you a better of overview of the different kinds of data you're working with. It also gives you the small chance of stumbling upon a potential error.
We can view random samples of the data using Python's random.sample() method.
import random
random.sample(train_file_list, k=10)
['n02102480-Sussex_spaniel/n02102480_4380.jpg', 'n02106662-German_shepherd/n02106662_19641.jpg', 'n02107908-Appenzeller/n02107908_2151.jpg', 'n02087046-toy_terrier/n02087046_2158.jpg', 'n02105056-groenendael/n02105056_537.jpg', 'n02088632-bluetick/n02088632_916.jpg', 'n02108000-EntleBucher/n02108000_2357.jpg', 'n02098286-West_Highland_white_terrier/n02098286_763.jpg', 'n02102177-Welsh_springer_spaniel/n02102177_1257.jpg', 'n02105505-komondor/n02105505_2083.jpg']
Now let's do a quick check to make sure none of the training image file names appear in the testing image file names list.
This is important because the number 1 rule in machine learning is: always keep the test set separate from the training set.
We can check that there are no overlaps by turning train_file_list into a Python set() and using the intersection() method.
# How many files in the training set intersect with the testing set?
len(set(train_file_list).intersection(test_file_list))
0
Excellent! Looks like there are no overlaps.
We could even put an assert check to raise an error if there are any overlaps (e.g. the length of the intersection is greater than 0).
assert works in the fashion: assert expression, message_if_expression_fails.
If the assert check doesn't output anything, we're good to go!
# Make an assertion statement to check there are no overlaps (try changing test_file_list to train_file_list to see how it works)
assert len(set(train_file_list).intersection(test_file_list)) == 0, "There are overlaps between the training and test set files, please check them."
Woohoo!
Looks like there's no overlaps, let's keep exploring the data.
Exploring the Annotation folder¶
How about we look at the Annotation folder next?
We can click the folder on the file explorer on the left to see what's inside.
But we can also explore the contents of the folder with Python.
Let's use os.listdir() to see what's inside.
os.listdir("Annotation")[:10]
['n02090622-borzoi', 'n02106662-German_shepherd', 'n02097298-Scotch_terrier', 'n02093647-Bedlington_terrier', 'n02108915-French_bulldog', 'n02097474-Tibetan_terrier', 'n02110958-pug', 'n02101388-Brittany_spaniel', 'n02110806-basenji', 'n02101006-Gordon_setter']
Looks like there are files each with a dog breed name with several numbered files inside.
Each of the files contains a HTML version of an annotation relating to an image.
For example, Annotation/n02085620-Chihuahua/n02085620_10074:
<annotation>
<folder>02085620</folder>
<filename>n02085620_10074</filename>
<source>
<database>ImageNet database</database>
</source>
<size>
<width>333</width>
<height>500</height>
<depth>3</depth>
</size>
<segment>0</segment>
<object>
<name>Chihuahua</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>25</xmin>
<ymin>10</ymin>
<xmax>276</xmax>
<ymax>498</ymax>
</bndbox>
</object>
</annotation>
The fields include the name of the image, the size of the image, the label of the object and where it is (bounding box coordinates).
If we were performing object detection (finding the location of a thing in an image), we'd pay attention to the <bndbox> coordinates.
However, since we're focused on classification, our main consideration is the mapping of image name to class name.
Since we're dealing with 120 classes of dog breed, let's write a function to check the number of subfolders in the Annotation directory (there should be 120 subfolders, one for each breed of dog).
To do so, we can use Python's pathlib.Path class, along with Path.iterdir() to loop over the contents of Annotation and Path.is_dir() to check if the target item is a directory.
from pathlib import Path
def count_subfolders(directory_path: str) -> int:
"""
Count the number of subfolders in a given directory.
Args:
directory_path (str): The path to the directory in which to count subfolders.
Returns:
int: The number of subfolders in the specified directory.
Examples:
>>> count_subfolders('/path/to/directory')
3 # if there are 3 subfolders in the specified directory
"""
return len([name for name in Path(directory_path).iterdir() if name.is_dir()])
directory_path = "Annotation"
folder_count = count_subfolders(directory_path)
print(f"Number of subfolders in {directory_path} directory: {folder_count}")
Number of subfolders in Annotation directory: 120
Perfect!
There are 120 subfolders of annotations, one for each class of dog we'd like to identify.
But on further inspection of our file lists, it looks like the class name is already in the filepath.
# View a single training file pathname
train_file_list[0]
'n02085620-Chihuahua/n02085620_5927.jpg'
With this information we know, that image n02085620_5927.jpg should contain a Chihuahua.
Let's check.
I searched "how to display an image in Google Colab" and found another answer on Stack Overflow.
Turns out you can use IPython.display.Image(), as Google Colab comes with IPython (Interactive Python) built-in.
from IPython.display import Image
Image(Path("Images", train_file_list[0]))
Woah!
We get an image of a dog!
Exploring the Images folder¶
We've explored the Annotations folder, now let's check out our Images folder.
We know that the image file names come in the format class_name/image_name, for example, n02085620-Chihuahua/n02085620_5927.jpg.
To make things a little simpler, let's create the following:
- A mapping from folder name -> class name in dictionary form, for example,
{'n02113712-miniature_poodle': 'miniature_poodle', 'n02092339-Weimaraner': 'weimaraner', 'n02093991-Irish_terrier': 'irish_terrier'...}. This will help us when visualizing our data from its original folder. - A list of all unique dog class names with simple formatting, for example,
['affenpinscher', 'afghan_hound', 'african_hunting_dog', 'airedale', 'american_staffordshire_terrier'...].
Let's start by getting a list of all the folders in the Images directory with os.listdir().
# Get a list of all image folders
image_folders = os.listdir("Images")
image_folders[:10]
['n02090622-borzoi', 'n02106662-German_shepherd', 'n02097298-Scotch_terrier', 'n02093647-Bedlington_terrier', 'n02108915-French_bulldog', 'n02097474-Tibetan_terrier', 'n02110958-pug', 'n02101388-Brittany_spaniel', 'n02110806-basenji', 'n02101006-Gordon_setter']
Excellent!
Now let's make a dictionary which maps from the folder name to a simplified version of the class name, for example:
{'n02085782-Japanese_spaniel': 'japanese_spaniel',
'n02106662-German_shepherd': 'german_shepherd',
'n02093256-Staffordshire_bullterrier': 'staffordshire_bullterrier',
...}
# Create folder name -> class name dict
folder_to_class_name_dict = {}
for folder_name in image_folders:
# Turn folder name into class_name
# E.g. "n02089078-black-and-tan_coonhound" -> "black_and_tan_coonhound"
# We'll split on the first "-" and join the rest of the string with "_" and then lower it
class_name = "_".join(folder_name.split("-")[1:]).lower()
folder_to_class_name_dict[folder_name] = class_name
# Make sure there are 120 entries in the dictionary
assert len(folder_to_class_name_dict) == 120
Folder name to class name mapping created, let's view the first 10.
sorted(folder_to_class_name_dict.items())[:10]
[('n02085620-Chihuahua', 'chihuahua'),
('n02085782-Japanese_spaniel', 'japanese_spaniel'),
('n02085936-Maltese_dog', 'maltese_dog'),
('n02086079-Pekinese', 'pekinese'),
('n02086240-Shih-Tzu', 'shih_tzu'),
('n02086646-Blenheim_spaniel', 'blenheim_spaniel'),
('n02086910-papillon', 'papillon'),
('n02087046-toy_terrier', 'toy_terrier'),
('n02087394-Rhodesian_ridgeback', 'rhodesian_ridgeback'),
('n02088094-Afghan_hound', 'afghan_hound')]
And we can get a list of unique dog names by getting the values() of the folder_to_class_name_dict and turning it into a list.
dog_names = sorted(list(folder_to_class_name_dict.values()))
dog_names[:10]
['affenpinscher', 'afghan_hound', 'african_hunting_dog', 'airedale', 'american_staffordshire_terrier', 'appenzeller', 'australian_terrier', 'basenji', 'basset', 'beagle']
Perfect!
Now we've got:
folder_to_class_name_dict- a mapping from the folder name to the class name.dog_names- a list of all the unique dog breeds we're working with.
Visualize a group of random images¶
How about we follow the data explorers motto of visualize, visualize, visualize and view some random images?
To help us visualize, let's create a function that takes in a list of image paths and then randomly selects 10 of those paths to display.
The function will:
- Take in a select list of image paths.
- Create a grid of matplotlib plots (e.g. 2x5 = 10 plots to plot on).
- Randomly sample 10 image paths from the input image path list (using
random.sample()). - Iterate through the flattened axes via
axes.flatwhich is a reference to the attributenumpy.ndarray.flat. - Extract the sample path from the list of samples.
- Get the sample title from the parent folder of the path using
Path.parent.stemand then extract the formatted dog breed name by indexingfolder_to_class_name_dict. - Read the image with
plt.imread()and show it on the targetaxwithax.imshow(). - Set the title of the plot to the parent folder name with
ax.set_title()and turn the axis marks of withax.axis("off")(this makes for pretty plots). - Show the plot with
plt.show().
Woah!
A lot of steps! But nothing we can't handle, let's do it.
from typing import List
from pathlib import Path
import matplotlib.pyplot as plt
import random
# 1. Take in a select list of image paths
def plot_10_random_images_from_path_list(path_list: List[Path],
extract_title=True) -> None:
# 2. Set up a grid of plots
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
# 3. Randomly sample 10 paths from the list
samples = random.sample(path_list, 10)
# 4. Iterate through the flattened axes and corresponding sample paths
for i, ax in enumerate(axes.flat):
# 5. Get the target sample path (e.g. "Images/n02087394-Rhodesian_ridgeback/n02087394_1161.jpg")
sample_path = samples[i]
# 6. Extract the parent directory name to use as the title (if necessary)
# (e.g. n02087394-Rhodesian_ridgeback/n02087394_1161.jpg -> n02087394-Rhodesian_ridgeback -> rhodesian_ridgeback)
if extract_title:
sample_title = folder_to_class_name_dict[sample_path.parent.stem]
else:
sample_title = sample_path.parent.stem
# 7. Read the image file and plot it on the corresponding axis
ax.imshow(plt.imread(sample_path))
# 8. Set the title of the axis and turn of the axis (for pretty plots)
ax.set_title(sample_title)
ax.axis("off")
# 9. Display the plot
plt.show()
plot_10_random_images_from_path_list(path_list=[Path("Images") / Path(file) for file in train_file_list])
Those are some nice looking dogs!
What I like to do here is rerun the random visualizations until I've seen 100+ samples so I've got an idea of the data we're working with.
Question: Here's something to think about, how would you code a system to differentiate between all the different breeds of dogs? Perhaps you write an algorithm to look at the shapes or the colours? You might be thinking "that would take quite a long time..." And you'd be right. Then how would we do it? Machine learning of course!
Exploring the distribution of our data¶
After visualization, another valuable way to explore the data is by checking the data distribution.
Distribution refers to the "spread" of data.
In our case, how many images of dogs do we have per breed?
A balanced distribution would mean having roughly the same number of images for each breed (e.g. 100 images per dog breed).
Note: There's a deeper level of distribution than just images per dog breed. Ideally, the images for each different breed are well distributed as well. For example, we wouldn't want to have 100 of the same image per dog breed. Not only would we like a similar number of images per breed, we'd like the images of each particular breed to be in different scenarios, different lighting, different angles. We want this because we want to our model to be able to recognize the correct dog breed no matter what angle the photo is taken from.
To figure out how many images we have per class, let's write a function count the number of images per subfolder in a given directory.
Specifically, we'll want the function to:
- Take in a target directory/folder.
- Create a list of all the subdirectories/subfolders in the target folder.
- Create an empty list,
image_class_countsto append subfolders and their counts to. - Iterate through all of the subdirectories.
- Get the class name of the target folder as the name of the folder.
- Count the number of images in the target folder using the length of the list of image paths (we can get these with
Path().rglob(*.jpg)where*.jpgmeans "all files with the extension.jpg. - Append a dictionary of
{"class_name": class_name, "image_count": image_count}to theimage_class_countslist (we create a list of dictionaries so we can turn this into a pandas DataFrame). - Return the
image_class_countslist.
# Create a dictionary of image counts
from pathlib import Path
from typing import List, Dict
# 1. Take in a target directory
def count_images_in_subdirs(target_directory: str) -> List[Dict[str, int]]:
"""
Counts the number of JPEG images in each subdirectory of the given directory.
Each subdirectory is assumed to represent a class, and the function counts
the number of '.jpg' files within each one. The result is a list of
dictionaries with the class name and corresponding image count.
Args:
target_directory (str): The path to the directory containing subdirectories.
Returns:
List[Dict[str, int]]: A list of dictionaries with 'class_name' and 'image_count' for each subdirectory.
Examples:
>>> count_images_in_subdirs('/path/to/directory')
[{'class_name': 'beagle', 'image_count': 50}, {'class_name': 'poodle', 'image_count': 60}]
"""
# 2. Create a list of all the subdirectoires in the target directory (these contain our images)
images_dir = Path(target_directory)
image_class_dirs = [directory for directory in images_dir.iterdir() if directory.is_dir()]
# 3. Create an empty list to append image counts to
image_class_counts = []
# 4. Iterate through all of the subdirectories
for image_class_dir in image_class_dirs:
# 5. Get the class name from image directory (e.g. "Images/n02116738-African_hunting_dog" -> "n02116738-African_hunting_dog")
class_name = image_class_dir.stem
# 6. Count the number of images in the target subdirectory
image_count = len(list(image_class_dir.rglob("*.jpg"))) # get length all files with .jpg file extension
# 7. Append a dictionary of class name and image count to count list
image_class_counts.append({"class_name": class_name,
"image_count": image_count})
# 8. Return the list
return image_class_counts
Ho ho, what a function!
Let's run it on our target directory Images and view the first few indexes.
image_class_counts = count_images_in_subdirs("Images")
image_class_counts[:3]
[{'class_name': 'n02090622-borzoi', 'image_count': 151},
{'class_name': 'n02106662-German_shepherd', 'image_count': 152},
{'class_name': 'n02097298-Scotch_terrier', 'image_count': 158}]
Nice!
Since our image_class_counts variable is the form of a list of dictionaries, we can turn it into a pandas DataFrame.
Let's sort the DataFrame by "image_count" so the classes with the most images appear at the top, we can do so with DataFrame.sort_values().
# Create a DataFrame
import pandas as pd
image_counts_df = pd.DataFrame(image_class_counts).sort_values(by="image_count", ascending=False)
image_counts_df.head()
| class_name | image_count | |
|---|---|---|
| 74 | n02085936-Maltese_dog | 252 |
| 77 | n02088094-Afghan_hound | 239 |
| 69 | n02092002-Scottish_deerhound | 232 |
| 28 | n02112018-Pomeranian | 219 |
| 75 | n02107683-Bernese_mountain_dog | 218 |
And let's cleanup the "class_name" column to be more readable by mapping the the values to our folder_to_class_name_dict.
# Make class name column easier to read
image_counts_df["class_name"] = image_counts_df["class_name"].map(folder_to_class_name_dict)
image_counts_df.head()
| class_name | image_count | |
|---|---|---|
| 74 | maltese_dog | 252 |
| 77 | afghan_hound | 239 |
| 69 | scottish_deerhound | 232 |
| 28 | pomeranian | 219 |
| 75 | bernese_mountain_dog | 218 |
Now we've got a DataFrame of image counts per class, we can make them more visual by turning them into a plot.
We covered plotting data directly from pandas DataFrame's in Section 3 of the Introduction to Matplotlib notebook: Plotting data directly with pandas.
To do so, we can use image_counts_df.plot(kind="bar", ...) along with some other customization.
# Turn the image counts DataFrame into a graph
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 7))
image_counts_df.plot(kind="bar",
x="class_name",
y="image_count",
legend=False,
ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there
# Add customization
plt.ylabel("Image Count")
plt.title("Total Image Counts by Class")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()
Beautiful! It looks like our classes are quite balanced. Each breed of dog has ~150 or more images.
We can find out some other quick stats about our data with DataFrame.describe().
# Get various statistics about our data distribution
image_counts_df.describe()
| image_count | |
|---|---|
| count | 120.000000 |
| mean | 171.500000 |
| std | 23.220898 |
| min | 148.000000 |
| 25% | 152.750000 |
| 50% | 159.500000 |
| 75% | 186.250000 |
| max | 252.000000 |
And the table shows a similar story to the plot. We can see the minimum number of images per class is 148, where as the maximum number of images is 252.
If one class had 10x less images than another class, we may look into collecting more data to improve the balance.
The main takeaway(s):
- When working on a classification problem, ideally, all classes have a similar number of samples (however, in some problems this may be unattainable, such as fraud detection, where you may have 1000x more "not fraud" samples to "fraud" samples.
- If you wanted to add a new class of dog breed to the existing 120, ideally, you'd have at least ~150 images for it.
TK - Creating training and test data split directories¶
After exploring the data, one of the next best things you can do is create experimental data splits.
This includes:
| Set Name | Description | Typical Percentage of Data |
|---|---|---|
| Training Set | A dataset for the model to learn on | 70-80% |
| Testing Set | A dataset for the model to be evaluated on | 20-30% |
| (Optional) Validation Set | A dataset to tune the model on | 50% of the test data |
| (Optional) Smaller Training Set | A smaller size dataset to run quick experiments on | 5-20% of the training set |
Our dog dataset already comes with specified training and test set splits.
So we'll stick with those.
But we'll also create a smaller training set (a random 10% of the training data) so we can stick to the machine learning engineers motto of experiment, experiment, experiment! and run quicker experiments.
Note: One of the most important things in machine learning is being able to experiment quickly. As in, try a new model, try a new set of hyperparameters or try a new training setup. When you start out, you want the time between your experiments to be as small as possible so you can quickly figure out what doesn't work so you can spend more time on and run larger experiments with what does work.
As previously discussed, we're working towards a directory structure of:
images_split/
├── train/
│ ├── class_1/
│ │ ├── train_image1.jpg
│ │ ├── train_image2.jpg
│ │ └── ...
│ ├── class_2/
│ │ ├── train_image1.jpg
│ │ ├── train_image2.jpg
│ │ └── ...
└── test/
├── class_1/
│ ├── test_image1.jpg
│ ├── test_image2.jpg
│ └── ...
├── class_2/
│ ├── test_image1.jpg
│ ├── test_image2.jpg
...
So let's write some code to create:
images/train/directory.images/test/directory.- Make a directory inside each of
images/train/andimages/test/for each of the dog breed classes.
We can make each of the directories we need using Path.mkdir().
For the dog breed directories, we'll loop through the list of dog_names and create a folder for each inside the images/train/ and images/test/ directories.
from pathlib import Path
# Define the target directory for image splits to go
images_split_dir = Path("images_split")
# Define the training and test directories
train_dir = images_split_dir / "train"
test_dir = images_split_dir / "test"
# Using Path.mkdir with exist_ok=True ensures the directory is created only if it doesn't exist
train_dir.mkdir(parents=True, exist_ok=True)
test_dir.mkdir(parents=True, exist_ok=True)
print(f"Directory {train_dir} is ensured to exist.")
print(f"Directory {test_dir} is ensured to exist.")
# Make a folder for each dog name
for dog_name in dog_names:
# Make training dir folder
train_class_dir = train_dir / dog_name
train_class_dir.mkdir(parents=True, exist_ok=True)
# print(f"Making directory: {train_class_dir}")
# Make testing dir folder
test_class_dir = test_dir / dog_name
test_class_dir.mkdir(parents=True, exist_ok=True)
# print(f"Making directory: {test_class_dir}")
# Make sure there is 120 subfolders in each
assert count_subfolders(train_dir) == len(dog_names)
assert count_subfolders(test_dir) == len(dog_names)
Directory images_split/train is ensured to exist. Directory images_split/test is ensured to exist.
Excellent!
We can check out the data split directories/folders we created by inspecting them in the files panel in Google Colab.
Alternatively, we can check the names of each by list the subdirectories inside them.
# See the first 10 directories in the training split dir
sorted([str(dir_name) for dir_name in train_dir.iterdir() if dir_name.is_dir()])[:10]
['images_split/train/affenpinscher', 'images_split/train/afghan_hound', 'images_split/train/african_hunting_dog', 'images_split/train/airedale', 'images_split/train/american_staffordshire_terrier', 'images_split/train/appenzeller', 'images_split/train/australian_terrier', 'images_split/train/basenji', 'images_split/train/basset', 'images_split/train/beagle']
You might've noticed that all of our dog breed directories are empty.
Let's change that by getting some images in there.
To do so, we'll create a function called copy_files_to_target_dir() which will copy images from the Images directory into their respective directories inside images/train and images/test.
More specifically, it will:
- Take in a list of source files to copy (e.g.
train_file_list) and a target directory to copy files to. - Iterate through the list of sources files to copy (we'll use
tqdmwhich comes installed with Google Colab to create a progress bar of how many files have been copied). - Convert the source file path to a
Pathobject. - Split the source file path and create a
Pathobject for the destination folder (e.g. "n02112018-Pomeranian" -> "pomeranian"). - Get the target file name (e.g. "n02112018-Pomeranian/n02112018_6208.jpg" -> "n02112018_6208.jpg").
- Create a destination path for the source file to be copied to (e.g.
images_split/train/pomeranian/n02112018_6208.jpg). - Ensure the destination directory exists, similar to the step we took in the previous section (you can't copy files to a directory that doesn't exist).
- Print out the progress of copying (if necessary).
- Copy the source file to the destination using Python's
shutil.copy2(src, dst).
from pathlib import Path
from shutil import copy2
from tqdm.auto import tqdm
# 1. Take in a list of source files to copy and a target directory
def copy_files_to_target_dir(file_list: list[str],
target_dir: str,
images_dir: str = "Images",
verbose: bool = False) -> None:
"""
Copies a list of files from the images directory to a target directory.
Parameters:
file_list (list[str]): A list of file paths to copy.
target_dir (str): The destination directory path where files will be copied.
images_dir (str, optional): The directory path where the images are currently stored. Defaults to 'Images'.
verbose (bool, optional): If set to True, the function will print out the file paths as they are being copied. Defaults to False.
Returns:
None
"""
# 2. Iterate through source files
for file in tqdm(file_list):
# 3. Convert file path to a Path object
source_file_path = Path(images_dir) / Path(file)
# 4. Split the file path and create a Path object for the destination folder
# e.g. "n02112018-Pomeranian" -> "pomeranian"
file_class_name = folder_to_class_name_dict[Path(file).parts[0]]
# 5. Get the name of the target image
file_image_name = Path(file).name
# 6. Create the destination path
destination_file_path = Path(target_dir) / file_class_name / file_image_name
# 7. Ensure the destination directory exists (this is a safety check, can't copy an image to a file that doesn't exist)
destination_file_path.parent.mkdir(parents=True, exist_ok=True)
# 8. Print out copy message if necessary
if verbose:
print(f"[INFO] Copying: {source_file_path} to {destination_file_path}")
# 9. Copy the original path to the destination path
copy2(src=source_file_path, dst=destination_file_path)
Copying function created!
Let's test it out by copying the files in the train_file_list to train_dir.
# Copy training images from Images to images_split/train/...
copy_files_to_target_dir(file_list=train_file_list,
target_dir=train_dir,
verbose=False) # set this to True to get an output of the copy process
# (warning: this will output a large amount of text)
0%| | 0/12000 [00:00<?, ?it/s]
Woohoo!
Looks like our copying function copied 12000 training images in their respective directories inside images_split/train/.
How about we do the same for test_file_list and test_dir?
copy_files_to_target_dir(file_list=test_file_list,
target_dir=test_dir,
verbose=False)
0%| | 0/8580 [00:00<?, ?it/s]
Nice! 8580 testing images copied from Images to images_split/test/.
Let's write some code to check that the number of files in the train_file_list is the same as the number of images files in train_dir (and the same for the test files).
# Get list of of all .jpg paths in train and test image directories
train_image_paths = list(train_dir.rglob("*.jpg"))
test_image_paths = list(test_dir.rglob("*.jpg"))
# Make sure the number of images in the training and test directories equals the number of files in their original lists
assert len(train_image_paths) == len(train_file_list)
assert len(test_image_paths) == len(test_file_list)
print(f"Number of images in {train_dir}: {len(train_image_paths)}")
print(f"Number of images in {test_dir}: {len(test_image_paths)}")
Number of images in images_split/train: 12000 Number of images in images_split/test: 8580
And adhering to the data explorers motto of visualize, visualize, visualize!, let's plot some random images from the train_image_paths list.
# Plot 10 random images from the train_image_paths
plot_10_random_images_from_path_list(path_list=train_image_paths,
extract_title=False) # don't need to extract the title since the image directories are already named simply
TK - Making a 10% training dataset split¶
We've already split the data into training and test sets, so why might we want to make another split?
Well, remember the machine learners motto?
Experiment, experiment, experiment!
We're going to make another training split which contains a random 10% (approximately 1,200 images, since the original training set has 12,000 images) of the data from the original training split.
Why?
Because whilst machine learning models generally perform better with more data, having more data means longer computation times.
And longer computation times means the time between our experiments gets longer.
Which is not what we want in the beginning.
In the beginning of any new machine learning project, your focus should be to reduce the amount of time between experiments as much as possible.
Why?
Because running more experiments means you can figure out what doesn't work.
And if you figure out what doesn't work, you can start working closer towards what does.
Once you find something that does work, you can start to scale up your experiments (more data, bigger models, longer training times - we'll see these later on).
- TK image - make an image diagram of the image split folder we're going to make e.g. train_10_percent...
To make our 10% training dataset, let's copy a random 10% of the existing training set to a new folder called images_split/train_10_percent.
Let's start by creating that folder.
# Create train_10_percent directory
train_10_percent_dir = images_split_dir / "train_10_percent"
train_10_percent_dir.mkdir(parents=True, exist_ok=True)
Now we should have 3 split folders inside images_split.
os.listdir(images_split_dir)
['train', 'train_10_percent', 'test']
Beautiful!
Now let's create a list of random training sample filepaths using Python's random.sample(), we'll want the total length of the list to equal 10% of the original training split.
To make things reproducible, we'll use a random seed (this is not 100% necessary, it just makes it so we get the same 10% of training image paths each time).
import random
# Set a random seed
random.seed(42)
# Get a 10% sample of the training image paths
train_image_paths_random_10_percent = random.sample(population=train_image_paths,
k=int(0.1*len(train_image_paths)))
# Check how many image paths we got
print(f"Original number of training image paths: {len(train_image_paths)}")
print(f"Number of 10% training image paths: {len(train_image_paths_random_10_percent)}")
print("First 5 random 10% training image paths:")
train_image_paths_random_10_percent[:5]
Original number of training image paths: 12000 Number of 10% training image paths: 1200 First 5 random 10% training image paths:
[PosixPath('images_split/train/silky_terrier/n02097658_6289.jpg'),
PosixPath('images_split/train/border_terrier/n02093754_5038.jpg'),
PosixPath('images_split/train/miniature_pinscher/n02107312_4613.jpg'),
PosixPath('images_split/train/afghan_hound/n02088094_4219.jpg'),
PosixPath('images_split/train/irish_setter/n02100877_2298.jpg')]
Random 10% training image paths acquired!
Let's copy them to the images_split/train_10_percent directory using similar code to our copy_files_to_target_dir() function.
# Copy training 10% split images from images_split/train/ to images_split/train_10_percent/...
for source_file_path in tqdm(train_image_paths_random_10_percent):
# Create the destination file path
destination_file_and_image_name = Path(*source_file_path.parts[-2:]) # "images_split/train/yorkshire_terrier/n02094433_2223.jpg" -> "yorkshire_terrier/n02094433_2223.jpg"
destination_file_path = train_10_percent_dir / destination_file_and_image_name # "yorkshire_terrier/n02094433_2223.jpg" -> "images_split/train_10_percent/yorkshire_terrier/n02094433_2223.jpg"
# If the target directory doesn't exist, make it
target_class_dir = destination_file_path.parent
if not target_class_dir.is_dir():
# print(f"Making directory: {target_class_dir}")
target_class_dir.mkdir(parents=True,
exist_ok=True)
# print(f"Copying: {source_file_path} to {destination_file_path}")
copy2(src=source_file_path,
dst=destination_file_path)
0%| | 0/1200 [00:00<?, ?it/s]
1200 images copied!
Let's check our training 10% set distribution and make sure we've got some images for each class.
We can use our count_images_in_subdirs() function to count the images in each of the dog breed folders in the train_10_percent_dir.
# Count images in train_10_percent_dir
train_10_percent_image_class_counts = count_images_in_subdirs(train_10_percent_dir)
train_10_percent_image_class_counts_df = pd.DataFrame(train_10_percent_image_class_counts).sort_values("image_count", ascending=True)
train_10_percent_image_class_counts_df.head()
| class_name | image_count | |
|---|---|---|
| 33 | collie | 3 |
| 23 | italian_greyhound | 4 |
| 61 | dingo | 4 |
| 64 | american_staffordshire_terrier | 4 |
| 100 | great_dane | 5 |
Okay, looks like a few classes have only a handful of images.
Let's make sure there's 120 subfolders by checking the length of the train_10_percent_image_class_counts_df.
# How many subfolders are there?
print(len(train_10_percent_image_class_counts_df))
120
Beautiful, our train 10% dataset split has a folder for each of the dog breed classes.
Note: Ideally our random 10% training set would have the same distribution per class as the original training set, however, for this example, we've taken a global random 10% rather than a random 10% per class. This is okay for now, however for more fine-grained tasks, you may want to make sure your smaller training set is better distributed.
For one last check, let's plot the distribution of our train 10% dataset.
# Plot distribution of train 10% dataset.
plt.figure(figsize=(14, 7))
train_10_percent_image_class_counts_df.plot(kind="bar",
x="class_name",
y="image_count",
legend=False,
ax=plt.gca()) # plt.gca() = "get current axis", get the plt we setup above and put the data there
# Add customization
plt.title("Train 10 Percent Image Counts by Class")
plt.ylabel("Image Count")
plt.xticks(rotation=90, # Rotate the x labels for better visibility
fontsize=8) # Make the font size smaller for easier reading
plt.tight_layout() # Ensure things fit nicely
plt.show()
Excellent! Our train 10% dataset distribution looks similar to the original training set distribution.
However, it could be better.
If we really wanted to, we could recreate the train 10% dataset with 10% of the images from each class rather than 10% of images globally.
TK - Turning datasets into TensorFlow Dataset(s)¶
Alright, we've spent a bunch of time getting our dog images into different folders.
But how do we get the images from different folders into a machine learning model?
Well, like the other machine learning models we've built, we need a way to turn our images into numbers.
Specifically, we're going to turn our images into tensors.
That's where the "Tensor" comes from in "TensorFlow".
A tensor is a way to numerically represent something (where something can be almost anything you can think of, text, images, audio, rows and columns).
There are several different ways to load data into TensorFlow.
But the formula is the same across data types, have data -> use TensorFlow to turn it into tensors.
The reason why we spent time getting our data into the standard image classification format (where the class name is the folder name) is because TensorFlow includes several utility functions to load data from this directory format.
| Function | Description |
|---|---|
tf.keras.utils.image_dataset_from_directory() |
Creates a tf.data.Dataset from image files in a directory. |
tf.keras.utils.audio_dataset_from_directory() |
Creates a tf.data.Dataset from audio files in a directory. |
tf.keras.utils.text_dataset_from_directory() |
Creates a tf.data.Dataset from text files in a directory. |
tf.keras.utils.timeseries_dataset_from_array() |
Creates a dataset of sliding windows over a timeseries provided as array. |
What is a tf.data.Dataset?
It's TensorFlow's efficient way to store a potentially large set of elements.
As machine learning datasets can get quite large, you need an efficient way to store and load them.
This is what the tf.data.Dataset API provides.
And it's what we'd like to turn our dog images into.
Since we're working with images, we can do so with tf.keras.utils.image_dataset_from_directory().
We'll pass in the following parameters:
directory= the target directory we'd like to turn into atf.data.Dataset.label_mode= the kind of labels we'd like to use, in our case it's"categorical"since we're dealing with a multi-class classification problem.batch_size= the number of images we'd like our model to see at a time (due to computation limitations, our model won't be able to look at every image at once), generally 32 is a good value to start.image_size= the size we'd like to shape our images to before we feed them to our model (height x width).shuffle= whether we'd like our dataset to be shuffled to randomize the order.seed= if we're shuffling the order in a random fashion, do we want that to be reproducible?
Note: Values such as
batch_sizeandimage_sizeare known as hyperparameters, meaning they're values that you can decide what to set them as. As for the best value for a given hyperparameter, that depends highly on the data you're working with, problem space and compute capabilities you've got avaiable. Best to experiment!
With all this being said, let's see it in practice!
We'll make 3 tf.data.Dataset's, train_10_percent_ds, train_ds and test_ds.
import tensorflow as tf
# Create constants
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
SEED = 42
# Create train 10% dataset
train_10_percent_ds = tf.keras.utils.image_dataset_from_directory(
directory=train_10_percent_dir,
label_mode="categorical", # turns labels into one-hot representations (e.g. [0, 0, 1, ..., 0, 0])
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
shuffle=True, # shuffle training datasets to prevent learning of order
seed=SEED
)
# Create full train dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
directory=train_dir,
label_mode="categorical",
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
shuffle=True,
seed=SEED
)
# Create test dataset
test_ds = tf.keras.utils.image_dataset_from_directory(
directory=test_dir,
label_mode="categorical",
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
shuffle=False, # don't need to shuffle the test dataset (this makes evaluations easier)
seed=SEED
)
Found 1200 files belonging to 120 classes. Found 12000 files belonging to 120 classes. Found 8580 files belonging to 120 classes.
Note: If you're working with similar styles of data (e.g. all dog photos), it's best practice to shuffle training datasets to prevent the model from learning any order in the data, no need to shuffle testing datasets (this makes for easier evaluation).
tf.data.Datasets created!
Let's check out one of them.
train_10_percent_ds
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 120), dtype=tf.float32, name=None))>
You'll notice a few things going on here.
Essentially, we've got a collection of tuples:
- The image tensor(s) -
TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None)where(None, 224, 224, 3)is the shape of the image tensor (Noneis the batch size,(224, 224)is theIMG_SIZEwe set and3is the number of colour channels, as in, red, green, blue or RGB since our images are in colour). - The label tensor(s) -
TensorSpec(shape=(None, 120), dtype=tf.int32, name=None)whereNoneis the batch size and120is the number of labels we're using.
The batch size often appears as None since it's flexible and can change on the fly.
Each batch of images is assosciated with a batch of labels.
Instead of talking about it, let's check out what a single batch looks like.
We can do so by turning the tf.data.Dataset into an iterable with Python's built-in iter() and then getting the "next" batch with next().
# What does a single batch look like?
image_batch, label_batch = next(iter(train_ds))
image_batch.shape, label_batch.shape
(TensorShape([32, 224, 224, 3]), TensorShape([32, 120]))
Nice!
We get back a single batch of images and labels.
Looks like a single image_batch has a shape of [32, 224, 224, 3] ([batch_size, height, width, colour_channels]).
And our labels have a shape of [32, 120] ([batch_size, labels]).
These are numerical representations of our data images and labels!
Note: The shape of a tensor does not necessarily reflect the values inside a tensor.
We can further inspect our data by looking at a single sample.
# Get a single sample from a single batch
print(f"Single image tensor:\n{image_batch[0]}\n")
print(f"Single label tensor: {label_batch[0]}") # notice the 1 is the index of the target label (our labels are one-hot encoded)
print(f"Single sample class name: {dog_names[tf.argmax(label_batch[0])]}")
Single image tensor: [[[196.61607 174.61607 160.61607 ] [197.84822 175.84822 161.84822 ] [200. 178. 164. ] ... [ 60.095097 79.75804 45.769207] [ 61.83293 71.22575 63.288315] [ 77.65755 83.65755 81.65755 ]] [[196. 174. 160. ] [197.83876 175.83876 161.83876 ] [199.07945 177.07945 163.07945 ] ... [ 94.573715 110.55229 83.59694 ] [125.869865 135.26268 127.33472 ] [122.579605 128.5796 126.579605]] [[195.73691 173.73691 159.73691 ] [196.896 174.896 160.896 ] [199. 177. 163. ] ... [ 26.679413 38.759026 20.500835] [ 24.372307 31.440136 26.675896] [ 20.214453 26.214453 24.214453]] ... [[ 61.57369 70.18976 104.72547 ] [189.91965 199.61607 213.28572 ] [247.26637 255. 252.70387 ] ... [113.40158 83.40158 57.40158 ] [110.75214 78.75214 53.752136] [107.37048 75.37048 50.370483]] [[ 61.27007 69.88614 104.42185 ] [188.93079 198.62721 212.29686 ] [246.33257 255. 251.77007 ] ... [110.88623 80.88623 54.88623 ] [102.763245 70.763245 45.763245] [ 99.457634 67.457634 42.457638]] [[ 60.25893 68.875 103.41071 ] [188.58261 198.27904 211.94868 ] [245.93112 254.6097 251.36862 ] ... [105.02222 75.02222 49.022217] [109.11186 77.11186 52.111866] [106.56936 74.56936 49.56936 ]]] Single label tensor: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] Single sample class name: schipperke
Woah!!
We've got a numerical representation of a dog image!
This is exactly the kind of format our model will want.
Can we do the reverse?
Instead of image -> numbers, can we go from numbers -> image?
You bet.
TK - Visualizing images from our TensorFlow Dataset¶
Let's follow the data explorers motto once again and visualize, visualize, visualize!
How about we turn our single sample from tensor format to image format?
We can do so by passing the single sample image tensor to matplotlib's plt.imshow() (we'll also need to convert its datatype from float32 to uint8 to avoid matplotlib colour range issues).
plt.imshow(image_batch[0].numpy().astype("uint8")) # convert to uint8 to avoid matplotlib colour range issues
plt.title(dog_names[tf.argmax(label_batch[0])])
plt.axis("off");
How about we plot multiple images?
We can do so by first setting up a plot with multiple subplots.
And then we can iterate through our dataset with tf.data.Dataset.take(count=1) which will "take" 1 batch of data which we can then index on for each subplot.
# Create multiple subplots
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(20, 10))
# Iterate through a single batch and plot images
for images, labels in train_ds.take(count=1): # note: because our training data is shuffled, each "take" will be different
for i, ax in enumerate(axes.flat):
ax.imshow(images[i].numpy().astype("uint8"))
ax.set_title(dog_names[tf.argmax(labels[i])])
ax.axis("off")
Aren't those good looking dogs!
TK - Getting labels from our TensorFlow Dataset¶
Since our data is now in tf.data.Dataset format, there are a couple of important attributes we can pull from it if necessary.
The first is the collection of filepaths asosciated with a tf.data.Dataset.
These are accessible by the .file_paths attribute.
Note: You can often a see a list of assosciated methods and attributes of a variable/class in Google Colab (or other IDEs) by pressing TAB afterwards (e.g type
variable_name.+ TAB).
# Get the first 5 file paths of the training dataset
train_ds.file_paths[:5]
['images_split/train/boston_bull/n02096585_1753.jpg', 'images_split/train/kerry_blue_terrier/n02093859_855.jpg', 'images_split/train/border_terrier/n02093754_2281.jpg', 'images_split/train/rottweiler/n02106550_11823.jpg', 'images_split/train/airedale/n02096051_5884.jpg']
We can also get the class names assosciated with a dataset using .class_names (TensorFlow has read these from the names of our target folders in the images_split directory).
# Get the class names TensorFlow has read from the target directory
class_names = train_ds.class_names
class_names[:5]
['affenpinscher', 'afghan_hound', 'african_hunting_dog', 'airedale', 'american_staffordshire_terrier']
And we can make sure the class names are the same across our datasets by comparing them.
assert set(train_10_percent_ds.class_names) == set(train_ds.class_names) == set(test_ds.class_names)
TK - Configuring our datasets for performance¶
There's one last step we're going to do before we build our first TensorFlow model.
And that's configure our datasets for performance.
More specifically, we're going to focus on following the TensorFlow guide for Better performance with the tf.data API.
Why?
Because data loading is one of the biggest bottlenecks in machine learning.
Modern GPUs can perform calculations (matrix multiplications) to find patterns in data quite quickly.
However, for the GPU to perform such calculations, the data needs to be there.
Good news for us is that if we follow the TensorFlow tf.data best practices, TensorFlow will take care of all these optimizations and hardware acceleration for us.
We're going to call three methods on our dataset to optimize it for performance:
cache()- Cache the elements in the dataset in memory or a target folder (speeds up loading.shuffle()- Shuffle a set number of samples in preparation for loading (this will mean our samples and batches of samples will be shuffled), for example, settingshuffle(buffer_size=1000)will prepare and shuffle 1000 elements of data at a time.prefetch()- Prefetch the next batch of data and prepare it for computation whilst the previous one is being computed on (can scale to multiple prefetches depending on hardware availability). TensorFlow can automatically configure how many elements/batches to prefetch by settingbuffer_size=tf.data.AUTOTUNE.
Resource: For more performance tips on loading dataset in TensorFlow, see the Datasets Performance tips guide.
In our case, let's start by calling cache() on our datasets to save the loaded samples to memory.
We'll then shuffle() the training splits with buffer_size=10*BATCH_SIZE for the training 10% split and buffer_size=100*BATCH_SIZE for the full training set (why these numbers? that's how many I decided to use via experimentation, feel free to figure out a different number that may work better, ideally if your dataset isn't too large, you would shuffle all possible samples with buffer_size=dataset.cardinality()).
We won't call shuffle() on the testing dataset since it isn't required.
And we'll call prefetch(buffer_size=tf.data.AUTOTUNE) on each of our datasets to automatically load and prepare a number of data batches.
AUTOTUNE = tf.data.AUTOTUNE # let TensorFlow find the best values to use automatically
# Shuffle and optimize performance on training datasets
# Note: these methods can be chained together and will have the same effect as calling them individually
train_10_percent_ds = train_10_percent_ds.cache().shuffle(buffer_size=10*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
train_ds = train_ds.cache().shuffle(buffer_size=100*BATCH_SIZE).prefetch(buffer_size=AUTOTUNE)
# Don't need to shuffle test datasets (for easier evaluation)
test_ds = test_ds.cache().prefetch(buffer_size=AUTOTUNE)
Dataset performance optimized!
Time to create our first neural network with TensorFlow!
TK - Creating a neural network with TensorFlow¶
We've spent lots of time preparing the data.
This is because it's often the largest part of a machine learning problem, getting your data ready for a machine learning model.
Thanks to modern frameworks like TensorFlow, when you've got your data in order, building a deep learning model to find patterns in your data can be one of the easier steps of the process.
When you hear people talk about deep learning, they're often referring to neural networks.
Neural networks are one of the most flexible machine learning models there is.
You can create a neural network to fit almost any kind of data.
The "deep" in deep learning refers to the many layers that can be contained inside a neural network.
A neural network often follows the structure of:
Input layer -> Middle layer(s) -> Output layer
TK - image of neural network with example
Where the input layer takes in the data, the middle layer(s) perform calculations on the data and (hopefully) learn patterns (also called weights/biases) to represent the data and the output layer performs a final transformation on the learned patterns to make them usable in human applications.
What goes into the middle layer(s)?
That's an excellent question.
Because there are so many different options.
But two of the most popular modern kinds of neural network are Convolutional Neural Networks (CNNs) and Transformers (the Transformer is the "T" in GPT, Generative Pretrained Transformer).
| Architecture | Description | Example Layers | Problem Examples |
|---|---|---|---|
| Transformer) | A combination of fully connected layers as well as attention-based layers. | tf.keras.layers.Attention, tf.keras.layers.Dense |
NLP, Machine Translation, Computer Vision |
| Convolutional Neural Network | A combination of fully connected layers as well as convolutional-based layers. | tf.keras.layers.Conv2D, tf.keras.layers.Dense |
Computer Vision, Audio Processing |
Because our problem is in the computer space, we're going to use a CNN.
And instead of crafting our own CNN from scratch, we're going to take an existing CNN model and apply it to our own problem, harnessing the wonderful superpower of transfer learning.
Note: You can build and use working neural networks with TensorFlow without knowing the intricate details that's going on the behind the scenes (that's what we're focused on). For an idea of the mathematical operations that make neural networks work, I'd recommend going through 3Blue1Brown's YouTube series on Neural Networks.
TK - The magic of transfer learning¶
Transfer learning is the process of getting an existing working model and adjusting it to your own problem.
This works particularly well for neural networks.
The main benefit of transfer learning is being able to get better results in less time with less data.
How?
An existing model may have the following features:
- Trained on lots of data (in the case of computer vision, existing models are often pretrained on ImageNet, a dataset of 1M+ images).
- Crafted by expert researchers (large universities and companies such as Google and Meta often open-source their best models for others to try and use).
- Trained of lots of computing hardware (the larger the model and the larger the dataset, the more compute power you need, not everyone has access to 100s of GPUs).
- Proven to perform well on a given task through several studies (this means it has a good chance on performing well on your task if it's similar).
You may be thinking, ok so, this all sounds incredible, where can I get pretrained models?
And the good news is, there are plenty of places to find pretrained models!
tf.keras.applications- A module built-in to TensorFlow and Keras with a series of pretrained models ready to use.Hugging Face Models Hub- A large collection of pretrained models on a wide range on tasks, from computer vision to natural language processing to audio processing.Kaggle Models- A huge collection of different pretrained models for many different tasks.
Note: For most new machine learning problems, if you're looking to get good results quickly, you should generally look for a pretrained model similar to your problem and use transfer learning to adapt it to your own domain.
Since we're focused on TensorFlow, we're going to be using a pretrained model from tf.keras.applications.
More specifically, we're going to take the tf.keras.applications.efficientnet_v2.EfficientNetV2B0() model from the 2021 machine learning paper EfficientNetV2: Smaller Models and Faster Training from Google Research and apply it to our own problem.
This model has been trained on ImageNet (1M+ images across 1000 classes) so it has a good baseline understanding of patterns in images across a wide domain.
We'll see if we can adjust those patterns slightly to our dog images.
Let's create an instance of it and call it base_model (I'll explain why next).
# Create the input shape to our model
INPUT_SHAPE = (*IMG_SIZE, 3)
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=True, # do want to include the top layer? (ImageNet has 1000 classes, so the top layer is formulated for this, we want to create our own top layer)
include_preprocessing=True, # do we want the network to preprocess our data into the right format for us? (yes)
weights="imagenet", # do we want the network to come with pretrained weights? (yes)
input_shape=INPUT_SHAPE # what is the input shape of our data we're going to pass to the network? (224, 224, 3) -> (height, width, colour_channels)
)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0.h5 29403144/29403144 [==============================] - 1s 0us/step
Base model created!
We can find out information about our base model by calling base_model.summary().
base_model.summary()
Model: "efficientnetv2-b0"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0 []
rescaling (Rescaling) (None, 224, 224, 3) 0 ['input_1[0][0]']
normalization (Normalizati (None, 224, 224, 3) 0 ['rescaling[0][0]']
on)
stem_conv (Conv2D) (None, 112, 112, 32) 864 ['normalization[0][0]']
stem_bn (BatchNormalizatio (None, 112, 112, 32) 128 ['stem_conv[0][0]']
n)
stem_activation (Activatio (None, 112, 112, 32) 0 ['stem_bn[0][0]']
n)
block1a_project_conv (Conv (None, 112, 112, 16) 4608 ['stem_activation[0][0]']
2D)
block1a_project_bn (BatchN (None, 112, 112, 16) 64 ['block1a_project_conv[0][0]']
ormalization)
block1a_project_activation (None, 112, 112, 16) 0 ['block1a_project_bn[0][0]']
(Activation)
block2a_expand_conv (Conv2 (None, 56, 56, 64) 9216 ['block1a_project_activation[0
D) ][0]']
block2a_expand_bn (BatchNo (None, 56, 56, 64) 256 ['block2a_expand_conv[0][0]']
rmalization)
block2a_expand_activation (None, 56, 56, 64) 0 ['block2a_expand_bn[0][0]']
(Activation)
block2a_project_conv (Conv (None, 56, 56, 32) 2048 ['block2a_expand_activation[0]
2D) [0]']
block2a_project_bn (BatchN (None, 56, 56, 32) 128 ['block2a_project_conv[0][0]']
ormalization)
block2b_expand_conv (Conv2 (None, 56, 56, 128) 36864 ['block2a_project_bn[0][0]']
D)
block2b_expand_bn (BatchNo (None, 56, 56, 128) 512 ['block2b_expand_conv[0][0]']
rmalization)
block2b_expand_activation (None, 56, 56, 128) 0 ['block2b_expand_bn[0][0]']
(Activation)
block2b_project_conv (Conv (None, 56, 56, 32) 4096 ['block2b_expand_activation[0]
2D) [0]']
block2b_project_bn (BatchN (None, 56, 56, 32) 128 ['block2b_project_conv[0][0]']
ormalization)
block2b_drop (Dropout) (None, 56, 56, 32) 0 ['block2b_project_bn[0][0]']
block2b_add (Add) (None, 56, 56, 32) 0 ['block2b_drop[0][0]',
'block2a_project_bn[0][0]']
block3a_expand_conv (Conv2 (None, 28, 28, 128) 36864 ['block2b_add[0][0]']
D)
block3a_expand_bn (BatchNo (None, 28, 28, 128) 512 ['block3a_expand_conv[0][0]']
rmalization)
block3a_expand_activation (None, 28, 28, 128) 0 ['block3a_expand_bn[0][0]']
(Activation)
block3a_project_conv (Conv (None, 28, 28, 48) 6144 ['block3a_expand_activation[0]
2D) [0]']
block3a_project_bn (BatchN (None, 28, 28, 48) 192 ['block3a_project_conv[0][0]']
ormalization)
block3b_expand_conv (Conv2 (None, 28, 28, 192) 82944 ['block3a_project_bn[0][0]']
D)
block3b_expand_bn (BatchNo (None, 28, 28, 192) 768 ['block3b_expand_conv[0][0]']
rmalization)
block3b_expand_activation (None, 28, 28, 192) 0 ['block3b_expand_bn[0][0]']
(Activation)
block3b_project_conv (Conv (None, 28, 28, 48) 9216 ['block3b_expand_activation[0]
2D) [0]']
block3b_project_bn (BatchN (None, 28, 28, 48) 192 ['block3b_project_conv[0][0]']
ormalization)
block3b_drop (Dropout) (None, 28, 28, 48) 0 ['block3b_project_bn[0][0]']
block3b_add (Add) (None, 28, 28, 48) 0 ['block3b_drop[0][0]',
'block3a_project_bn[0][0]']
block4a_expand_conv (Conv2 (None, 28, 28, 192) 9216 ['block3b_add[0][0]']
D)
block4a_expand_bn (BatchNo (None, 28, 28, 192) 768 ['block4a_expand_conv[0][0]']
rmalization)
block4a_expand_activation (None, 28, 28, 192) 0 ['block4a_expand_bn[0][0]']
(Activation)
block4a_dwconv2 (Depthwise (None, 14, 14, 192) 1728 ['block4a_expand_activation[0]
Conv2D) [0]']
block4a_bn (BatchNormaliza (None, 14, 14, 192) 768 ['block4a_dwconv2[0][0]']
tion)
block4a_activation (Activa (None, 14, 14, 192) 0 ['block4a_bn[0][0]']
tion)
block4a_se_squeeze (Global (None, 192) 0 ['block4a_activation[0][0]']
AveragePooling2D)
block4a_se_reshape (Reshap (None, 1, 1, 192) 0 ['block4a_se_squeeze[0][0]']
e)
block4a_se_reduce (Conv2D) (None, 1, 1, 12) 2316 ['block4a_se_reshape[0][0]']
block4a_se_expand (Conv2D) (None, 1, 1, 192) 2496 ['block4a_se_reduce[0][0]']
block4a_se_excite (Multipl (None, 14, 14, 192) 0 ['block4a_activation[0][0]',
y) 'block4a_se_expand[0][0]']
block4a_project_conv (Conv (None, 14, 14, 96) 18432 ['block4a_se_excite[0][0]']
2D)
block4a_project_bn (BatchN (None, 14, 14, 96) 384 ['block4a_project_conv[0][0]']
ormalization)
block4b_expand_conv (Conv2 (None, 14, 14, 384) 36864 ['block4a_project_bn[0][0]']
D)
block4b_expand_bn (BatchNo (None, 14, 14, 384) 1536 ['block4b_expand_conv[0][0]']
rmalization)
block4b_expand_activation (None, 14, 14, 384) 0 ['block4b_expand_bn[0][0]']
(Activation)
block4b_dwconv2 (Depthwise (None, 14, 14, 384) 3456 ['block4b_expand_activation[0]
Conv2D) [0]']
block4b_bn (BatchNormaliza (None, 14, 14, 384) 1536 ['block4b_dwconv2[0][0]']
tion)
block4b_activation (Activa (None, 14, 14, 384) 0 ['block4b_bn[0][0]']
tion)
block4b_se_squeeze (Global (None, 384) 0 ['block4b_activation[0][0]']
AveragePooling2D)
block4b_se_reshape (Reshap (None, 1, 1, 384) 0 ['block4b_se_squeeze[0][0]']
e)
block4b_se_reduce (Conv2D) (None, 1, 1, 24) 9240 ['block4b_se_reshape[0][0]']
block4b_se_expand (Conv2D) (None, 1, 1, 384) 9600 ['block4b_se_reduce[0][0]']
block4b_se_excite (Multipl (None, 14, 14, 384) 0 ['block4b_activation[0][0]',
y) 'block4b_se_expand[0][0]']
block4b_project_conv (Conv (None, 14, 14, 96) 36864 ['block4b_se_excite[0][0]']
2D)
block4b_project_bn (BatchN (None, 14, 14, 96) 384 ['block4b_project_conv[0][0]']
ormalization)
block4b_drop (Dropout) (None, 14, 14, 96) 0 ['block4b_project_bn[0][0]']
block4b_add (Add) (None, 14, 14, 96) 0 ['block4b_drop[0][0]',
'block4a_project_bn[0][0]']
block4c_expand_conv (Conv2 (None, 14, 14, 384) 36864 ['block4b_add[0][0]']
D)
block4c_expand_bn (BatchNo (None, 14, 14, 384) 1536 ['block4c_expand_conv[0][0]']
rmalization)
block4c_expand_activation (None, 14, 14, 384) 0 ['block4c_expand_bn[0][0]']
(Activation)
block4c_dwconv2 (Depthwise (None, 14, 14, 384) 3456 ['block4c_expand_activation[0]
Conv2D) [0]']
block4c_bn (BatchNormaliza (None, 14, 14, 384) 1536 ['block4c_dwconv2[0][0]']
tion)
block4c_activation (Activa (None, 14, 14, 384) 0 ['block4c_bn[0][0]']
tion)
block4c_se_squeeze (Global (None, 384) 0 ['block4c_activation[0][0]']
AveragePooling2D)
block4c_se_reshape (Reshap (None, 1, 1, 384) 0 ['block4c_se_squeeze[0][0]']
e)
block4c_se_reduce (Conv2D) (None, 1, 1, 24) 9240 ['block4c_se_reshape[0][0]']
block4c_se_expand (Conv2D) (None, 1, 1, 384) 9600 ['block4c_se_reduce[0][0]']
block4c_se_excite (Multipl (None, 14, 14, 384) 0 ['block4c_activation[0][0]',
y) 'block4c_se_expand[0][0]']
block4c_project_conv (Conv (None, 14, 14, 96) 36864 ['block4c_se_excite[0][0]']
2D)
block4c_project_bn (BatchN (None, 14, 14, 96) 384 ['block4c_project_conv[0][0]']
ormalization)
block4c_drop (Dropout) (None, 14, 14, 96) 0 ['block4c_project_bn[0][0]']
block4c_add (Add) (None, 14, 14, 96) 0 ['block4c_drop[0][0]',
'block4b_add[0][0]']
block5a_expand_conv (Conv2 (None, 14, 14, 576) 55296 ['block4c_add[0][0]']
D)
block5a_expand_bn (BatchNo (None, 14, 14, 576) 2304 ['block5a_expand_conv[0][0]']
rmalization)
block5a_expand_activation (None, 14, 14, 576) 0 ['block5a_expand_bn[0][0]']
(Activation)
block5a_dwconv2 (Depthwise (None, 14, 14, 576) 5184 ['block5a_expand_activation[0]
Conv2D) [0]']
block5a_bn (BatchNormaliza (None, 14, 14, 576) 2304 ['block5a_dwconv2[0][0]']
tion)
block5a_activation (Activa (None, 14, 14, 576) 0 ['block5a_bn[0][0]']
tion)
block5a_se_squeeze (Global (None, 576) 0 ['block5a_activation[0][0]']
AveragePooling2D)
block5a_se_reshape (Reshap (None, 1, 1, 576) 0 ['block5a_se_squeeze[0][0]']
e)
block5a_se_reduce (Conv2D) (None, 1, 1, 24) 13848 ['block5a_se_reshape[0][0]']
block5a_se_expand (Conv2D) (None, 1, 1, 576) 14400 ['block5a_se_reduce[0][0]']
block5a_se_excite (Multipl (None, 14, 14, 576) 0 ['block5a_activation[0][0]',
y) 'block5a_se_expand[0][0]']
block5a_project_conv (Conv (None, 14, 14, 112) 64512 ['block5a_se_excite[0][0]']
2D)
block5a_project_bn (BatchN (None, 14, 14, 112) 448 ['block5a_project_conv[0][0]']
ormalization)
block5b_expand_conv (Conv2 (None, 14, 14, 672) 75264 ['block5a_project_bn[0][0]']
D)
block5b_expand_bn (BatchNo (None, 14, 14, 672) 2688 ['block5b_expand_conv[0][0]']
rmalization)
block5b_expand_activation (None, 14, 14, 672) 0 ['block5b_expand_bn[0][0]']
(Activation)
block5b_dwconv2 (Depthwise (None, 14, 14, 672) 6048 ['block5b_expand_activation[0]
Conv2D) [0]']
block5b_bn (BatchNormaliza (None, 14, 14, 672) 2688 ['block5b_dwconv2[0][0]']
tion)
block5b_activation (Activa (None, 14, 14, 672) 0 ['block5b_bn[0][0]']
tion)
block5b_se_squeeze (Global (None, 672) 0 ['block5b_activation[0][0]']
AveragePooling2D)
block5b_se_reshape (Reshap (None, 1, 1, 672) 0 ['block5b_se_squeeze[0][0]']
e)
block5b_se_reduce (Conv2D) (None, 1, 1, 28) 18844 ['block5b_se_reshape[0][0]']
block5b_se_expand (Conv2D) (None, 1, 1, 672) 19488 ['block5b_se_reduce[0][0]']
block5b_se_excite (Multipl (None, 14, 14, 672) 0 ['block5b_activation[0][0]',
y) 'block5b_se_expand[0][0]']
block5b_project_conv (Conv (None, 14, 14, 112) 75264 ['block5b_se_excite[0][0]']
2D)
block5b_project_bn (BatchN (None, 14, 14, 112) 448 ['block5b_project_conv[0][0]']
ormalization)
block5b_drop (Dropout) (None, 14, 14, 112) 0 ['block5b_project_bn[0][0]']
block5b_add (Add) (None, 14, 14, 112) 0 ['block5b_drop[0][0]',
'block5a_project_bn[0][0]']
block5c_expand_conv (Conv2 (None, 14, 14, 672) 75264 ['block5b_add[0][0]']
D)
block5c_expand_bn (BatchNo (None, 14, 14, 672) 2688 ['block5c_expand_conv[0][0]']
rmalization)
block5c_expand_activation (None, 14, 14, 672) 0 ['block5c_expand_bn[0][0]']
(Activation)
block5c_dwconv2 (Depthwise (None, 14, 14, 672) 6048 ['block5c_expand_activation[0]
Conv2D) [0]']
block5c_bn (BatchNormaliza (None, 14, 14, 672) 2688 ['block5c_dwconv2[0][0]']
tion)
block5c_activation (Activa (None, 14, 14, 672) 0 ['block5c_bn[0][0]']
tion)
block5c_se_squeeze (Global (None, 672) 0 ['block5c_activation[0][0]']
AveragePooling2D)
block5c_se_reshape (Reshap (None, 1, 1, 672) 0 ['block5c_se_squeeze[0][0]']
e)
block5c_se_reduce (Conv2D) (None, 1, 1, 28) 18844 ['block5c_se_reshape[0][0]']
block5c_se_expand (Conv2D) (None, 1, 1, 672) 19488 ['block5c_se_reduce[0][0]']
block5c_se_excite (Multipl (None, 14, 14, 672) 0 ['block5c_activation[0][0]',
y) 'block5c_se_expand[0][0]']
block5c_project_conv (Conv (None, 14, 14, 112) 75264 ['block5c_se_excite[0][0]']
2D)
block5c_project_bn (BatchN (None, 14, 14, 112) 448 ['block5c_project_conv[0][0]']
ormalization)
block5c_drop (Dropout) (None, 14, 14, 112) 0 ['block5c_project_bn[0][0]']
block5c_add (Add) (None, 14, 14, 112) 0 ['block5c_drop[0][0]',
'block5b_add[0][0]']
block5d_expand_conv (Conv2 (None, 14, 14, 672) 75264 ['block5c_add[0][0]']
D)
block5d_expand_bn (BatchNo (None, 14, 14, 672) 2688 ['block5d_expand_conv[0][0]']
rmalization)
block5d_expand_activation (None, 14, 14, 672) 0 ['block5d_expand_bn[0][0]']
(Activation)
block5d_dwconv2 (Depthwise (None, 14, 14, 672) 6048 ['block5d_expand_activation[0]
Conv2D) [0]']
block5d_bn (BatchNormaliza (None, 14, 14, 672) 2688 ['block5d_dwconv2[0][0]']
tion)
block5d_activation (Activa (None, 14, 14, 672) 0 ['block5d_bn[0][0]']
tion)
block5d_se_squeeze (Global (None, 672) 0 ['block5d_activation[0][0]']
AveragePooling2D)
block5d_se_reshape (Reshap (None, 1, 1, 672) 0 ['block5d_se_squeeze[0][0]']
e)
block5d_se_reduce (Conv2D) (None, 1, 1, 28) 18844 ['block5d_se_reshape[0][0]']
block5d_se_expand (Conv2D) (None, 1, 1, 672) 19488 ['block5d_se_reduce[0][0]']
block5d_se_excite (Multipl (None, 14, 14, 672) 0 ['block5d_activation[0][0]',
y) 'block5d_se_expand[0][0]']
block5d_project_conv (Conv (None, 14, 14, 112) 75264 ['block5d_se_excite[0][0]']
2D)
block5d_project_bn (BatchN (None, 14, 14, 112) 448 ['block5d_project_conv[0][0]']
ormalization)
block5d_drop (Dropout) (None, 14, 14, 112) 0 ['block5d_project_bn[0][0]']
block5d_add (Add) (None, 14, 14, 112) 0 ['block5d_drop[0][0]',
'block5c_add[0][0]']
block5e_expand_conv (Conv2 (None, 14, 14, 672) 75264 ['block5d_add[0][0]']
D)
block5e_expand_bn (BatchNo (None, 14, 14, 672) 2688 ['block5e_expand_conv[0][0]']
rmalization)
block5e_expand_activation (None, 14, 14, 672) 0 ['block5e_expand_bn[0][0]']
(Activation)
block5e_dwconv2 (Depthwise (None, 14, 14, 672) 6048 ['block5e_expand_activation[0]
Conv2D) [0]']
block5e_bn (BatchNormaliza (None, 14, 14, 672) 2688 ['block5e_dwconv2[0][0]']
tion)
block5e_activation (Activa (None, 14, 14, 672) 0 ['block5e_bn[0][0]']
tion)
block5e_se_squeeze (Global (None, 672) 0 ['block5e_activation[0][0]']
AveragePooling2D)
block5e_se_reshape (Reshap (None, 1, 1, 672) 0 ['block5e_se_squeeze[0][0]']
e)
block5e_se_reduce (Conv2D) (None, 1, 1, 28) 18844 ['block5e_se_reshape[0][0]']
block5e_se_expand (Conv2D) (None, 1, 1, 672) 19488 ['block5e_se_reduce[0][0]']
block5e_se_excite (Multipl (None, 14, 14, 672) 0 ['block5e_activation[0][0]',
y) 'block5e_se_expand[0][0]']
block5e_project_conv (Conv (None, 14, 14, 112) 75264 ['block5e_se_excite[0][0]']
2D)
block5e_project_bn (BatchN (None, 14, 14, 112) 448 ['block5e_project_conv[0][0]']
ormalization)
block5e_drop (Dropout) (None, 14, 14, 112) 0 ['block5e_project_bn[0][0]']
block5e_add (Add) (None, 14, 14, 112) 0 ['block5e_drop[0][0]',
'block5d_add[0][0]']
block6a_expand_conv (Conv2 (None, 14, 14, 672) 75264 ['block5e_add[0][0]']
D)
block6a_expand_bn (BatchNo (None, 14, 14, 672) 2688 ['block6a_expand_conv[0][0]']
rmalization)
block6a_expand_activation (None, 14, 14, 672) 0 ['block6a_expand_bn[0][0]']
(Activation)
block6a_dwconv2 (Depthwise (None, 7, 7, 672) 6048 ['block6a_expand_activation[0]
Conv2D) [0]']
block6a_bn (BatchNormaliza (None, 7, 7, 672) 2688 ['block6a_dwconv2[0][0]']
tion)
block6a_activation (Activa (None, 7, 7, 672) 0 ['block6a_bn[0][0]']
tion)
block6a_se_squeeze (Global (None, 672) 0 ['block6a_activation[0][0]']
AveragePooling2D)
block6a_se_reshape (Reshap (None, 1, 1, 672) 0 ['block6a_se_squeeze[0][0]']
e)
block6a_se_reduce (Conv2D) (None, 1, 1, 28) 18844 ['block6a_se_reshape[0][0]']
block6a_se_expand (Conv2D) (None, 1, 1, 672) 19488 ['block6a_se_reduce[0][0]']
block6a_se_excite (Multipl (None, 7, 7, 672) 0 ['block6a_activation[0][0]',
y) 'block6a_se_expand[0][0]']
block6a_project_conv (Conv (None, 7, 7, 192) 129024 ['block6a_se_excite[0][0]']
2D)
block6a_project_bn (BatchN (None, 7, 7, 192) 768 ['block6a_project_conv[0][0]']
ormalization)
block6b_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6a_project_bn[0][0]']
D)
block6b_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6b_expand_conv[0][0]']
rmalization)
block6b_expand_activation (None, 7, 7, 1152) 0 ['block6b_expand_bn[0][0]']
(Activation)
block6b_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6b_expand_activation[0]
Conv2D) [0]']
block6b_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6b_dwconv2[0][0]']
tion)
block6b_activation (Activa (None, 7, 7, 1152) 0 ['block6b_bn[0][0]']
tion)
block6b_se_squeeze (Global (None, 1152) 0 ['block6b_activation[0][0]']
AveragePooling2D)
block6b_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6b_se_squeeze[0][0]']
e)
block6b_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6b_se_reshape[0][0]']
block6b_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6b_se_reduce[0][0]']
block6b_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6b_activation[0][0]',
y) 'block6b_se_expand[0][0]']
block6b_project_conv (Conv (None, 7, 7, 192) 221184 ['block6b_se_excite[0][0]']
2D)
block6b_project_bn (BatchN (None, 7, 7, 192) 768 ['block6b_project_conv[0][0]']
ormalization)
block6b_drop (Dropout) (None, 7, 7, 192) 0 ['block6b_project_bn[0][0]']
block6b_add (Add) (None, 7, 7, 192) 0 ['block6b_drop[0][0]',
'block6a_project_bn[0][0]']
block6c_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6b_add[0][0]']
D)
block6c_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6c_expand_conv[0][0]']
rmalization)
block6c_expand_activation (None, 7, 7, 1152) 0 ['block6c_expand_bn[0][0]']
(Activation)
block6c_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6c_expand_activation[0]
Conv2D) [0]']
block6c_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6c_dwconv2[0][0]']
tion)
block6c_activation (Activa (None, 7, 7, 1152) 0 ['block6c_bn[0][0]']
tion)
block6c_se_squeeze (Global (None, 1152) 0 ['block6c_activation[0][0]']
AveragePooling2D)
block6c_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6c_se_squeeze[0][0]']
e)
block6c_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6c_se_reshape[0][0]']
block6c_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6c_se_reduce[0][0]']
block6c_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6c_activation[0][0]',
y) 'block6c_se_expand[0][0]']
block6c_project_conv (Conv (None, 7, 7, 192) 221184 ['block6c_se_excite[0][0]']
2D)
block6c_project_bn (BatchN (None, 7, 7, 192) 768 ['block6c_project_conv[0][0]']
ormalization)
block6c_drop (Dropout) (None, 7, 7, 192) 0 ['block6c_project_bn[0][0]']
block6c_add (Add) (None, 7, 7, 192) 0 ['block6c_drop[0][0]',
'block6b_add[0][0]']
block6d_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6c_add[0][0]']
D)
block6d_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6d_expand_conv[0][0]']
rmalization)
block6d_expand_activation (None, 7, 7, 1152) 0 ['block6d_expand_bn[0][0]']
(Activation)
block6d_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6d_expand_activation[0]
Conv2D) [0]']
block6d_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6d_dwconv2[0][0]']
tion)
block6d_activation (Activa (None, 7, 7, 1152) 0 ['block6d_bn[0][0]']
tion)
block6d_se_squeeze (Global (None, 1152) 0 ['block6d_activation[0][0]']
AveragePooling2D)
block6d_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6d_se_squeeze[0][0]']
e)
block6d_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6d_se_reshape[0][0]']
block6d_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6d_se_reduce[0][0]']
block6d_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6d_activation[0][0]',
y) 'block6d_se_expand[0][0]']
block6d_project_conv (Conv (None, 7, 7, 192) 221184 ['block6d_se_excite[0][0]']
2D)
block6d_project_bn (BatchN (None, 7, 7, 192) 768 ['block6d_project_conv[0][0]']
ormalization)
block6d_drop (Dropout) (None, 7, 7, 192) 0 ['block6d_project_bn[0][0]']
block6d_add (Add) (None, 7, 7, 192) 0 ['block6d_drop[0][0]',
'block6c_add[0][0]']
block6e_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6d_add[0][0]']
D)
block6e_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6e_expand_conv[0][0]']
rmalization)
block6e_expand_activation (None, 7, 7, 1152) 0 ['block6e_expand_bn[0][0]']
(Activation)
block6e_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6e_expand_activation[0]
Conv2D) [0]']
block6e_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6e_dwconv2[0][0]']
tion)
block6e_activation (Activa (None, 7, 7, 1152) 0 ['block6e_bn[0][0]']
tion)
block6e_se_squeeze (Global (None, 1152) 0 ['block6e_activation[0][0]']
AveragePooling2D)
block6e_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6e_se_squeeze[0][0]']
e)
block6e_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6e_se_reshape[0][0]']
block6e_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6e_se_reduce[0][0]']
block6e_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6e_activation[0][0]',
y) 'block6e_se_expand[0][0]']
block6e_project_conv (Conv (None, 7, 7, 192) 221184 ['block6e_se_excite[0][0]']
2D)
block6e_project_bn (BatchN (None, 7, 7, 192) 768 ['block6e_project_conv[0][0]']
ormalization)
block6e_drop (Dropout) (None, 7, 7, 192) 0 ['block6e_project_bn[0][0]']
block6e_add (Add) (None, 7, 7, 192) 0 ['block6e_drop[0][0]',
'block6d_add[0][0]']
block6f_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6e_add[0][0]']
D)
block6f_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6f_expand_conv[0][0]']
rmalization)
block6f_expand_activation (None, 7, 7, 1152) 0 ['block6f_expand_bn[0][0]']
(Activation)
block6f_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6f_expand_activation[0]
Conv2D) [0]']
block6f_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6f_dwconv2[0][0]']
tion)
block6f_activation (Activa (None, 7, 7, 1152) 0 ['block6f_bn[0][0]']
tion)
block6f_se_squeeze (Global (None, 1152) 0 ['block6f_activation[0][0]']
AveragePooling2D)
block6f_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6f_se_squeeze[0][0]']
e)
block6f_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6f_se_reshape[0][0]']
block6f_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6f_se_reduce[0][0]']
block6f_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6f_activation[0][0]',
y) 'block6f_se_expand[0][0]']
block6f_project_conv (Conv (None, 7, 7, 192) 221184 ['block6f_se_excite[0][0]']
2D)
block6f_project_bn (BatchN (None, 7, 7, 192) 768 ['block6f_project_conv[0][0]']
ormalization)
block6f_drop (Dropout) (None, 7, 7, 192) 0 ['block6f_project_bn[0][0]']
block6f_add (Add) (None, 7, 7, 192) 0 ['block6f_drop[0][0]',
'block6e_add[0][0]']
block6g_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6f_add[0][0]']
D)
block6g_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6g_expand_conv[0][0]']
rmalization)
block6g_expand_activation (None, 7, 7, 1152) 0 ['block6g_expand_bn[0][0]']
(Activation)
block6g_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6g_expand_activation[0]
Conv2D) [0]']
block6g_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6g_dwconv2[0][0]']
tion)
block6g_activation (Activa (None, 7, 7, 1152) 0 ['block6g_bn[0][0]']
tion)
block6g_se_squeeze (Global (None, 1152) 0 ['block6g_activation[0][0]']
AveragePooling2D)
block6g_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6g_se_squeeze[0][0]']
e)
block6g_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6g_se_reshape[0][0]']
block6g_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6g_se_reduce[0][0]']
block6g_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6g_activation[0][0]',
y) 'block6g_se_expand[0][0]']
block6g_project_conv (Conv (None, 7, 7, 192) 221184 ['block6g_se_excite[0][0]']
2D)
block6g_project_bn (BatchN (None, 7, 7, 192) 768 ['block6g_project_conv[0][0]']
ormalization)
block6g_drop (Dropout) (None, 7, 7, 192) 0 ['block6g_project_bn[0][0]']
block6g_add (Add) (None, 7, 7, 192) 0 ['block6g_drop[0][0]',
'block6f_add[0][0]']
block6h_expand_conv (Conv2 (None, 7, 7, 1152) 221184 ['block6g_add[0][0]']
D)
block6h_expand_bn (BatchNo (None, 7, 7, 1152) 4608 ['block6h_expand_conv[0][0]']
rmalization)
block6h_expand_activation (None, 7, 7, 1152) 0 ['block6h_expand_bn[0][0]']
(Activation)
block6h_dwconv2 (Depthwise (None, 7, 7, 1152) 10368 ['block6h_expand_activation[0]
Conv2D) [0]']
block6h_bn (BatchNormaliza (None, 7, 7, 1152) 4608 ['block6h_dwconv2[0][0]']
tion)
block6h_activation (Activa (None, 7, 7, 1152) 0 ['block6h_bn[0][0]']
tion)
block6h_se_squeeze (Global (None, 1152) 0 ['block6h_activation[0][0]']
AveragePooling2D)
block6h_se_reshape (Reshap (None, 1, 1, 1152) 0 ['block6h_se_squeeze[0][0]']
e)
block6h_se_reduce (Conv2D) (None, 1, 1, 48) 55344 ['block6h_se_reshape[0][0]']
block6h_se_expand (Conv2D) (None, 1, 1, 1152) 56448 ['block6h_se_reduce[0][0]']
block6h_se_excite (Multipl (None, 7, 7, 1152) 0 ['block6h_activation[0][0]',
y) 'block6h_se_expand[0][0]']
block6h_project_conv (Conv (None, 7, 7, 192) 221184 ['block6h_se_excite[0][0]']
2D)
block6h_project_bn (BatchN (None, 7, 7, 192) 768 ['block6h_project_conv[0][0]']
ormalization)
block6h_drop (Dropout) (None, 7, 7, 192) 0 ['block6h_project_bn[0][0]']
block6h_add (Add) (None, 7, 7, 192) 0 ['block6h_drop[0][0]',
'block6g_add[0][0]']
top_conv (Conv2D) (None, 7, 7, 1280) 245760 ['block6h_add[0][0]']
top_bn (BatchNormalization (None, 7, 7, 1280) 5120 ['top_conv[0][0]']
)
top_activation (Activation (None, 7, 7, 1280) 0 ['top_bn[0][0]']
)
avg_pool (GlobalAveragePoo (None, 1280) 0 ['top_activation[0][0]']
ling2D)
top_dropout (Dropout) (None, 1280) 0 ['avg_pool[0][0]']
predictions (Dense) (None, 1000) 1281000 ['top_dropout[0][0]']
==================================================================================================
Total params: 7200312 (27.47 MB)
Trainable params: 7139704 (27.24 MB)
Non-trainable params: 60608 (236.75 KB)
__________________________________________________________________________________________________
Woah! Look at all those layers... this is what the "deep" in deep learning means! A deep number of layers.
How about we count the number of layers?
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
Number of layers in base_model: 273
273 layers!
Wow, there's a lot going on.
Rather than step through each layer and explain what's happening in each layer, I'll leave that for the curious mind to research on their own.
Just know that when starting out deep learning you don't need to know what's happening every layer in a model to be able to use a model.
For now, let's pay attention to a few things:
- The input layer (the first layer) input shape, this will tell us the shape of the data the model expects as input.
- The output layer (the last layer) output shape, this will tell us the shape of the data the model will output.
- The number of parameters of the model, these are "learnable" numbers (also called weights) that a model will use to derive patterns out of and represent the data. Generally, the more parameters a model has, the more learning capacity it has.
- The number of layers a model has. Generally, the more layers a model has, the more learning capacity it has (each layer will learn progressively deeper patterns from the data). However, this caps out at a certain range.
Let's step through each of these.
TK - Model input and output shapes¶
One of the most important practical steps in using a deep learning model is input and output shapes.
Two questions to ask:
- What is the shape of my input data?
- What is the ideal shape of my output data?
In our case, our input data has the shape of [(32, 224, 224, 3)] or [(batch_size, height, width, colour_channels)].
And our ideal output shape will be [(32, 120)] or [(batch_size, number_of_dog_classes).
Your input and output shapes will differ depending on the problem and data you're working with.
But as you get deeper into the world of machine learning (and deep learning), you'll find input and output shapes are one of the most common errors.
We can check our model's input and output shapes with the .input_shape and .output_shape attributes.
# Check the input shape of our model
base_model.input_shape
(None, 224, 224, 3)
Nice! Looks like our model's input shape is where we want it (remember None in this case is equivalent to a wild card dimension, meaning it could be any value, but we've set ours to 32).
This is because the model we chose, tf.keras.applications.efficientnet_v2.EfficientNetV2B0, has been trained on images the same size as our images.
If our model had a different input shape, we'd have to make sure we processed our images to be the same shape.
Now let's check the output shape.
# Check the model's output shape
base_model.output_shape
(None, 1000)
Hmm, is this what we're after?
Since we have 120 dog classes, we'd like an output shape of (None, 120).
Why is it by default (None, 1000)?
This is because the model has been trained already on ImageNet, a dataset of 1,000,000+ images with 1000 classes (hence the 1000 in the output shape).
How can we change this?
Let's recreate a base_model instance, except this time we'll change the classes parameter to 120.
# Create a base model with 120 output classes
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_shape=INPUT_SHAPE,
classes=len(dog_names)
)
base_model.output_shape
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-62-5e9b29e6f858> in <cell line: 2>() 1 # Create a base model with 120 output classes ----> 2 base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0( 3 include_top=True, 4 include_preprocessing=True, 5 weights="imagenet", /usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2B0(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing) 1128 include_preprocessing=True, 1129 ): -> 1130 return EfficientNetV2( 1131 width_coefficient=1.0, 1132 depth_coefficient=1.0, /usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2(width_coefficient, depth_coefficient, default_size, dropout_rate, drop_connect_rate, depth_divisor, min_depth, bn_momentum, activation, blocks_args, model_name, include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing) 932 933 if weights == "imagenet" and include_top and classes != 1000: --> 934 raise ValueError( 935 "If using `weights` as `'imagenet'` with `include_top`" 936 " as true, `classes` should be 1000" ValueError: If using `weights` as `'imagenet'` with `include_top` as true, `classes` should be 1000Received: classes=120
Oh dam!
We get an error:
ValueError: If using weights as 'imagenet' with include_top as true, classes should be 1000 Received: classes=120
What this is saying is that if we want to using the pretrained 'imagenet' weights (which we do to leverage the visual patterns/features a model has already learned on ImageNet, we need to change the parameters to the base_model.
What we're going to do is create our own top layers.
We can do this by setting include_top=False.
What this means is we'll use most of the model's existing layers to extract features and patterns out of our images and then customize the final few layers to our own problem.
This kind of transfer learning is often called feature extraction.
A setup where you use an existing models pretrained weights to extract features (or patterns) from your own custom data.
You can then used those extracted features and further tailor them to your own use case.
TK - image of customizing top layer
Let's create an instance of base_model without a top layer.
# Create a base model with no top
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=False,
include_preprocessing=True,
weights="imagenet",
input_shape=INPUT_SHAPE,
)
# Check the output shape
base_model.output_shape
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0_notop.h5 24274472/24274472 [==============================] - 1s 0us/step
(None, 7, 7, 1280)
Hmm, what's this output shape?
This still isn't what we want (we're after (None, 120) for our number of dog classes).
How about we check the number of layers again?
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
Number of layers in base_model: 270
Looks like our new base_model has less layers than our previous one.
This is because we used include_top=False.
This means we've still got 270 base layers to extract features and patterns from our images, however, it also means we get to customize the output layers to our liking.
We'll come back to this shortly.
TK - Model parameters¶
In traditional programming, you write a list of rules for inputs to go in, get manipulated in some predefined way and then outputs come out.
However, as we've discussed, machine learning switches the order.
Inputs and ideal outputs go in (for example, dog images and their corresponding labels) and rules come out.
A model's parameters are the learned rules.
And learned is the important point.
In an ideal setup, we never tell the model what parameters to learn, it learns them itself.
Note: Parameters are values learned by a model where as hyperpameters (e.g. batch size) are values set by a human.
Parameters also get referred to as "weights" or "patterns" or "learned features" or "learned representations".
Generally, the more parameters a model has, the more capacity it has to learn.
Each layer in a deep learning model will have a specific number of parameters (these vary depending on which layer you use).
The benefit of using a preconstructed model and transfer learning is that someone else has done the hard work in finding what combination of layers leads to a good set of parameters (a big thank you to these wonderful people).
We can count the number of parameters in a model/layer via the the .count_params() method.
# Check the number of parameters in our model
base_model.count_params()
5919312
Holy smokes!
Our model has 5,919,312 parameters!
That means each time an image goes through our model, it will be influenced in some small way by 5,919,312 numbers.
Each one of these is a potential learning opportunity (except for parameters that are non-trainable but we'll get to that soon too).
Now, you may be thinking, 5 million+ parameters sounds like a lot.
And it is.
However, many modern large scale models, such as GPT-3 (175B) and GPT-4 (200B+) deal in the billions of parameters (note: this is written in 2024, so if you're reading this in future, parameter counts may be in the trillions).
Generally, more parameters leads to better models.
However, there are always tradeoffs.
More parameters means more compute power to run the models.
In practice, if you have limited compute power (e.g. a single GPU on Google Colab), it's best to start with smaller models and gradually increase the size when necessary.
We can get the trainable and non-trainable parameters from our model with the trainable_weights and non_trainable_weights attributes (remember, parameters are also referred to as weights).
Note: Trainable weights are parameters of the model which are updated by backpropagation during training (they are changed to better match the data) where as non-trainable weights are parameters of the model which are not updated by backpropagation during training (they are fixed in place).
Let's write a function to count the trainable, non-trainable and trainable parameters of a model.
import numpy as np
def count_parameters(model, print_output=True):
"""
Counts the number of trainable, non-trainable and total parameters of a given model.
"""
trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.trainable_weights])
non_trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.non_trainable_weights])
total_parameters = trainable_parameters + non_trainable_parameters
if print_output:
print(f"Model {model.name} parameter counts:")
print(f"Total parameters: {total_parameters}")
print(f"Trainable parameters: {trainable_parameters}")
print(f"Non-trainable parameters: {non_trainable_parameters}")
else:
return total_parameters, trainable_parameters, non_trainable_parameters
count_parameters(model=base_model, print_output=True)
Model efficientnetv2-b0 parameter counts: Total parameters: 5919312 Trainable parameters: 5858704 Non-trainable parameters: 60608
Nice! It looks like our function worked.
Most of our model's parameters are trainable.
This means they will be tweaked as they see more images of dogs.
However, a standard practice in transfer learning is to freeze the base layers of a model and only train the custom top layers to suit your problem.
TK image - freeze base layers, train top layers
In other words, keep the patterns an existing model has learned on a similar problem (if they're good) to form a base representation of an input sample and then manipulate that base representation to suit our needs.
Why do this?
It's faster.
The less trainable parameters, the faster your model training will be, the faster your experiments will be.
So how do we freeze the parameters of our base_model?
We can set its .trainable attribute to False.
# Freeze the base model
base_model.trainable = False
base_model.trainable
False
base_model frozen!
Now let's check the number of trainable and non-trainable parameters.
count_parameters(model=base_model, print_output=True)
Model efficientnetv2-b0 parameter counts: Total parameters: 5919312.0 Trainable parameters: 0.0 Non-trainable parameters: 5919312
Beautiful!
Looks like all of the parameters in our base_model are now non-trainable (frozen).
This means they won't be updated during training.
TK - Passing data through our model¶
We've spoken a couple of times how our base_model is a "feature extractor" or "pattern extractor".
But what does this mean?
It means that when a data sample goes through the base_model, its numbers get manipulated into a compressed set of features.
In other words, the layers of the model will each perform a calculation on the sample eventually leading to an output tensor with patterns the model has deemed most important.
This is often referred to a compressed feature space.
That's one of the central ideas of deep learning.
Take a large input (e.g. an image tensor of shape [224, 224, 3]) and compress it into a smaller output (e.g. a feature vector#Feature_vectors) of shape [1280]) that captures a useful representation of the input.
Note: A feature vector is also referred to as an embedding, a compressed representation of a data sample that makes it useful. The concept of embeddings is not limited to images either, the concept of embeddings stretches across all data types (text, images, video, audio + more).
TK image - compression of input image
We can see this in action by passing a single image through our base_model.
# Extract features from a single image using our base model
feature_extraction = base_model(image_batch[0])
feature_extraction
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-69-957d897dc1dc> in <cell line: 2>() 1 # Extract features from a single image using our base model ----> 2 feature_extraction = base_model(image_batch[0]) 3 feature_extraction /usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs) 68 # To get the full stack trace, call: 69 # `tf.debugging.disable_traceback_filtering()` ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb /usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name) 296 if spec_dim is not None and dim is not None: 297 if spec_dim != dim: --> 298 raise ValueError( 299 f'Input {input_index} of layer "{layer_name}" is ' 300 "incompatible with the layer: " ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)
Oh no!
Another error...
ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)
We've stumbled upon one of the most common errors in machine learning, shape errors.
In our case, the shape of the data we're trying to put into the model doesn't match the input shape the model is expecting.
Our input data shape is (224, 224, 3) ((height, width, colour_channels)), however, our model is expecting (None, 224, 224, 3) ((batch_size, height, width, colour_channels)).
We can fix this error by adding a singluar batch_size dimension to our input and thus make it (1, 224, 224, 3) (a batch_size of 1 for a single sample).
To do so, we can use the tf.expand_dims(input=target_sample, axis=0) where target_sample is our input tensor and axis=0 means we want to expand the first dimension.
# Current image shape
shape_of_image_without_batch = image_batch[0].shape
# Add a batch dimension to our single image
shape_of_image_with_batch = tf.expand_dims(input=image_batch[0], axis=0).shape
print(f"Shape of image without batch: {shape_of_image_without_batch}")
print(f"Shape of image with batch: {shape_of_image_with_batch}")
Perfect!
Now let's pass this image with a batch dimension to our base_model.
# Extract features from a single image using our base model
feature_extraction = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_extraction
<tf.Tensor: shape=(1, 7, 7, 1280), dtype=float32, numpy=
array([[[[-2.19177201e-01, -3.44185606e-02, -1.40321642e-01, ...,
-1.44454449e-01, -2.73809850e-01, -7.41252452e-02],
[-8.69670734e-02, -6.48750067e-02, -2.14546964e-01, ...,
-4.57209721e-02, -2.77900100e-01, -8.20885971e-02],
[-2.76872963e-01, -8.26781020e-02, -3.85153107e-02, ...,
-2.72128999e-01, -2.52802134e-01, -2.28105962e-01],
...,
[-1.01604000e-01, -3.55145968e-02, -2.23027021e-01, ...,
-2.26227805e-01, -8.61771777e-02, -1.60450727e-01],
[-5.87608740e-02, -4.65543661e-03, -1.06193267e-01, ...,
-2.87548676e-02, -9.06914026e-02, -1.82624385e-01],
[-6.27618432e-02, -1.38620799e-03, 1.52704502e-02, ...,
-7.85450079e-03, -1.84584558e-01, -2.62404829e-01]],
[[-2.17334151e-01, -1.10280879e-01, -2.74605274e-01, ...,
-2.22405165e-01, -2.74738282e-01, -1.01998925e-01],
[-1.40700653e-01, -1.66820198e-01, -2.77449101e-01, ...,
2.40375683e-01, -2.77627349e-01, -9.07808691e-02],
[-2.40916476e-01, -2.00582087e-01, -2.38370374e-01, ...,
-8.27576742e-02, -2.78428614e-01, -1.23056054e-01],
...,
[-2.67296195e-01, -5.43131726e-03, -6.44061863e-02, ...,
-3.34720500e-02, -1.55141622e-01, -3.23073938e-02],
[-2.66513556e-01, -2.09966358e-02, -1.50375053e-01, ...,
-6.29274473e-02, -2.69798309e-01, -2.74081439e-01],
[-8.39830115e-02, -1.58605091e-02, -2.78447241e-01, ...,
-1.43555822e-02, -2.77474761e-01, 1.37483165e-01]],
[[-2.15840712e-01, 4.50323820e-01, -7.51058161e-02, ...,
-2.43637279e-01, -2.75048614e-01, -6.00421876e-02],
[-2.39066556e-01, -2.25066260e-01, -4.89832312e-02, ...,
-2.77957618e-01, -1.14677951e-01, -2.69968715e-02],
[-1.60943881e-01, -2.12972730e-01, -1.08622171e-01, ...,
-2.78464079e-01, -1.95970193e-01, -2.92074662e-02],
...,
[-2.67642140e-01, -7.13412274e-10, -2.47387841e-01, ...,
-1.27752789e-03, 1.69062471e+00, -1.07747754e-02],
[-2.69456387e-01, -3.02123808e-05, -2.19904676e-01, ...,
-1.19841937e-02, 6.54936790e-01, 4.92877871e-01],
[-1.83339473e-02, -9.84105989e-02, -2.77752399e-01, ...,
-9.53171253e-02, -2.76987553e-01, -1.81873620e-01]],
...,
[[-6.59235120e-02, -1.64803467e-03, -1.58951283e-01, ...,
-1.34164095e-01, -6.30896613e-02, -7.77927637e-02],
[-1.83377475e-01, -4.98497509e-04, -1.57654762e-01, ...,
-4.48885784e-02, -1.06884383e-01, -2.78372377e-01],
[-2.45749369e-01, -9.95399058e-03, -1.79216102e-01, ...,
-1.02837617e-02, -1.84168354e-01, -1.70697242e-01],
...,
[ 2.22050592e-01, -2.04384560e-04, -1.46467671e-01, ...,
-2.65387502e-02, -1.85434178e-01, -9.71652716e-02],
[ 1.52228832e+00, -3.39617883e-03, -3.22414264e-02, ...,
-1.19287046e-02, -1.46435276e-01, -8.73169452e-02],
[-1.89164400e-01, -5.49114570e-02, -2.05218419e-01, ...,
-1.32163316e-01, -1.48950770e-01, -1.18042991e-01]],
[[-2.16520607e-01, -7.84920622e-03, -1.43650264e-01, ...,
-1.73660204e-01, -4.83706780e-02, -3.76228467e-02],
[-2.78293848e-01, -6.24539470e-03, -2.28590608e-01, ...,
-2.06465453e-01, -1.93291768e-01, -9.23046917e-02],
[-2.40500003e-01, -2.73558766e-01, -1.58736348e-01, ...,
-4.13209312e-02, -2.64240265e-01, -3.26484852e-02],
...,
[-2.31358394e-01, -2.72292078e-01, -6.80670887e-02, ...,
-2.16453914e-02, -2.71368980e-01, -3.88960652e-02],
[-2.45319903e-01, -2.78179497e-01, -6.18890636e-02, ...,
-1.86282583e-02, -2.23804727e-01, -2.72233319e-02],
[-2.31111392e-01, -2.37449735e-01, -5.13911694e-02, ...,
-4.55225781e-02, -2.74753064e-01, -3.51530202e-02]],
[[-3.96142267e-02, -1.39998682e-02, -9.56050456e-02, ...,
-2.33392462e-01, -1.83407709e-01, -4.99856956e-02],
[-2.60713607e-01, -3.96164991e-02, -1.29626304e-01, ...,
-2.78417081e-01, -2.78285533e-01, -7.70441368e-02],
[-8.02241415e-02, -2.30456606e-01, -1.13508031e-01, ...,
-5.45607917e-02, -2.71063268e-01, -2.75666509e-02],
...,
[-9.41052362e-02, -2.42691532e-01, -5.48249595e-02, ...,
-2.13044193e-02, -2.63691694e-01, -9.28506851e-02],
[-9.08804908e-02, -2.40457997e-01, -7.88932368e-02, ...,
-3.80579121e-02, -2.71065891e-01, -4.05692160e-02],
[-1.26358300e-01, -2.17053503e-01, -7.44825602e-02, ...,
-5.66985942e-02, -2.75216103e-01, -6.91162944e-02]]]],
dtype=float32)>
Woah! Look at all those numbers!
After passing through ~270 layers, this is the numerical representation our model has created of our input image.
You might be thinking, okay, there's a lot here, how can I possibly understand all of them?
Well, with enough effort, you might.
However, these numbers are more for a model to understand than for a human to understand.
Let's not stop there, let's check the shape of our feature_extraction.
# Check shape of feature extraction
feature_extraction.shape
TensorShape([1, 7, 7, 1280])
Ok, looks like our model has compressed our input image into a lower dimensional feature space.
Note: Feature space (or latent space or embedding space) is a numerical region where pieces of data are represented by tensors of various dimensions. Feature space is hard for humans to imagine because it could be 1000s of dimensions (humans are only good at imagining 3-4 dimensions at max). But you can think of feature space as an area where similar items will be close to together. In feature space was a grocery store, one breed of dogs may be in one aisle (similar numbers) where as another breed of dogs may be in the next aisle.
Let's compare the new shape to the input shape.
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
num_input_features / feature_extraction_features
2.4
Looks like our model has compressed the numerical representation of our input image by 2.4x so far.
But you might've noticed our feature_extraction is still a tensor.
How about we take it further and turn it into a vector and compress the representation even further?
We can do so by taking our feature_extraction tensor and pooling together the inner dimensions.
By pooling, I mean taking the average or the maximum values.
Why?
Because a neural network often has a large amount of parameters but many of them can be insignificant compared to others.
So taking the average or the max across them helps us compress the representation further while stil preserving the most important features.
This process is often referred to as:
- Average pooling - Take the average across given dimensions of a tensor, can perform with
tf.keras.layers.GlobalAveragePooling2D(). - Max pooling - Take the maximum value across given dimensions of a tensor, can perform with
tf.keras.layers.MaxPooling2D().
Let's try apply average pooling to our feature extraction and see what happens.
# Turn feature extraction into a feature vector
feature_vector = tf.keras.layers.GlobalAveragePooling2D()(feature_extraction) # pass feature_extraction to the pooling layer
feature_vector
<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
-0.08420841, -0.07769417]], dtype=float32)>
Ho, ho!
Looks like we've compressed our feature_extraction tensor into a feature vector.
Now if you're not sure what all these numbers mean, that's okay. I don't either.
A feature vector (also called an embedding) is supposed to be a numerical representation that's meaningful to computers.
We'll perform a few more transforms on it before it's recognizable to us.
Let's check out its shape.
# Check out the feature vector shape
feature_vector.shape
TensorShape([1, 1280])
We've reduced the shape of feature_extraction from (1, 7, 7, 1280) to (1, 1280) (we've gone from a tensor with multiple dimensions to a vector with one dimension of size 1280).
This is one of the main goals of deep learning, to reduce higher dimensional information into a lower dimensional but still representative space.
Let's calculate how much we've reduced the dimensionality of our single input image.
# Compare the reduction
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
feature_vector_features = 1*1280
print(f"Input -> feature extraction reduction factor: {num_input_features / feature_extraction_features}")
print(f"Feature extraction -> feature vector reduction factor: {feature_extraction_features / feature_vector_features}")
print(f"Input -> feature extraction -> feature vector reduction factor: {num_input_features / feature_vector_features}")
Input -> feature extraction reduction factor: 2.4 Feature extraction -> feature vector reduction factor: 49.0 Input -> feature extraction -> feature vector reduction factor: 117.6
A 117.6x reduction from our original image to its feature vector representation!
Why compress the representation like this?
Because representing our data in a compressed format but still with meaningful numbers (to a computer) means that less computation is required to reuse the patterns.
For example, imagine you have to relearn how to spell words every time you use them.
Would this be efficient?
Not at all.
Instead, you take a while to learn them at the start and then continually reuse this knowledge over time.
This is the same with a deep learning model.
It learns representative patterns in data, figures out the ideal connections between inputs and outputs and then reuses them over time.
TK - Going from image to feature vector (practice)¶
We've covered a fair bit in the past few sections.
So let's practice.
The important takeaway is that one of the main goals of deep learning is to create a model that is able to take some kind of high dimensional data (e.g. an image tensor, a text tensor, an audio tensor) and extract meaningful patterns in it whilst compressing it to a lower dimensional form (e.g. a feature vector or embedding).
We can then use this lower dimensional form for our specific use cases.
And one of the most powerful ways to do this is with transfer learning.
Taking an existing model from a similar domain to yours and applying it to your own problem.
To practice turning a data sample into a feature vector, let's start by recreating a base_model instance.
This time, we can add in a pooling layer automatically using pooling="avg" or pooling="max".
Note: I demonstrated the use of the
tf.keras.layers.GlobalAveragePooling2D()layer because not all pretrained models have the functionality of a pooling layer being built-in.
# Create a base model with no top and a pooling layer built-in
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=False,
weights="imagenet",
input_shape=INPUT_SHAPE,
pooling="avg", # can also use "max"
include_preprocessing=True,
)
# Check the summary (optional)
# base_model.summary()
# Check the output shape
base_model.output_shape
(None, 1280)
Boom!
We get the same output shape from the base_model as we did when using it with a pooling layer thanks to using pooling="avg".
Let's now freeze these base weights, so they're not trainable.
# Freeze the base weights
base_model.trainable = False
# Count the parameters
count_parameters(model=base_model, print_output=True)
Model efficientnetv2-b0 parameter counts: Total parameters: 5919312.0 Trainable parameters: 0.0 Non-trainable parameters: 5919312
And now we can pass an image through our base model and get a feature vector from it.
# Get a feature vector of a single image (don't forget to add a batch dimension)
feature_vector_2 = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_vector_2
<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
-0.08420841, -0.07769417]], dtype=float32)>
Wonderful!
Now is this the same as our original feature_vector?
We can find out by comparing feature_vector and feature_vector_2 and seeing if all of the values are the same with np.all().
# Compare the two feature vectors
np.all(feature_vector == feature_vector_2)
True
Perfect!
Let's put it all together and create a full model for our dog vision problem.
TK - Creating a custom model for our dog vision problem¶
The main steps when creating any kind of deep learning model are:
- Define the input layer(s).
- Define the middle layer(s).
- Define the output layer(s).
These sound broad because they are. Deep learning models are almost infinitely customizable.
Good news is, thanks to transfer learning, all of our middle layers are defined by base_model (you could argue the input layer is created too).
So now it's up to us to define our input and output layers.
TensorFlow has two main ways of connecting layers to form a model.
- The Sequential model (
tf.keras.Sequential) - Useful for making simple models with one tensor in and one tensor out, not suited for complex models. - The Functional API - Useful for making more complex and multi-step models but can also be used for simple models.
Let's start with the Sequential model.
It takes a list of layers and will pass data through them sequentially.
Our base_model will be the input and middle layers and we'll use a tf.keras.layers.Dense() layer as the output (we'll discuss this shortly).
TK - creating a model with the sequential API¶
# Create a sequential model
tf.random.set_seed(42)
sequential_model = tf.keras.Sequential([base_model, # input and middle layers
tf.keras.layers.Dense(units=len(dog_names), # output layer
activation="softmax")])
sequential_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
dense (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Wonderful!
We've now got a model with 6,073,032 parameters, however, only 153,720 of them (the ones in the dense layer) are trainable.
Our dense layer (also called a fully-connected layer or feed-forward layer) takes the outputs of the base_model and performs further calulations on them to map them to our required number of classes (120 for the number of dog breeds).
We use activation="softmax" (the Softmax function) to get prediction probablities, values between 0 and 1 which represent how much our model "thinks" a specific image relates to a certain class.
There's another common activation function called Sigmoid. If we only had two classes, for example, "dog" or "cat", we'd lean towards using this function.
Confusing, yes, but you'll get used to different functions with practice.
The following table summarizes a few use cases.
| Activation Function | Use Cases | Code |
|---|---|---|
| Sigmoid | - When you have two choices (like yes or no, true or false). - In binary classification, where you're deciding between one thing or another (like if an email is spam or not spam). - When you want the output to be a probability between 0 and 1. |
tf.keras.activations.sigmoid or activation="sigmoid" |
| Softmax | - When you have more than two choices. - In multi-class classification, like if you're trying to decide if a picture is of a dog, a cat, a horse, or a bird. - When you want to compare the probabilities across different options and pick the most likely one. |
tf.keras.activations.softmax or activation="softmax" |
Now our model is built, let's check our input and output shapes.
# Check the input shape
sequential_model.input_shape
(None, 224, 224, 3)
# Check the output shape
sequential_model.output_shape
(None, 120)
Beautiful!
Our sequential model takes in an image tensor of size [None, 224, 224, 3] and outputs a vector of shape [None, 120] where None is the batch size we specify.
Let's try our sequential model out with a single image input.
# Get a single image with a batch size of 1
single_image_input = tf.expand_dims(image_batch[0], axis=0)
# Pass the image through our model
single_image_output_sequential = sequential_model(single_image_input)
# Check the output
single_image_output_sequential
<tf.Tensor: shape=(1, 120), dtype=float32, numpy=
array([[0.00783153, 0.01119391, 0.00476165, 0.0072348 , 0.00766934,
0.00753752, 0.00522398, 0.02337082, 0.00579716, 0.00539333,
0.00549823, 0.01011768, 0.00610076, 0.0109506 , 0.00540159,
0.0079683 , 0.01227358, 0.01056393, 0.00507148, 0.00996652,
0.00604106, 0.00729022, 0.0155036 , 0.00745004, 0.00628229,
0.00796217, 0.00905823, 0.00712278, 0.01243507, 0.006427 ,
0.00602891, 0.01276839, 0.00652441, 0.00842482, 0.01247454,
0.00749902, 0.01086363, 0.007803 , 0.0058652 , 0.00474356,
0.00902809, 0.00715358, 0.00981051, 0.00444271, 0.01031628,
0.00691859, 0.00699083, 0.0065892 , 0.00966169, 0.01177148,
0.00908043, 0.00729699, 0.00496712, 0.00509035, 0.00584058,
0.01068885, 0.00817651, 0.00602052, 0.00901201, 0.01008151,
0.00495409, 0.01285929, 0.00480146, 0.0108622 , 0.01421483,
0.00814719, 0.00910061, 0.00798947, 0.00789293, 0.00636969,
0.00656019, 0.01309155, 0.00754355, 0.00702062, 0.00485884,
0.00958675, 0.01086809, 0.00682202, 0.00923016, 0.00856321,
0.00482627, 0.01234931, 0.01140433, 0.00771413, 0.01140642,
0.00382939, 0.00891482, 0.00409833, 0.00771865, 0.00652135,
0.00668143, 0.00935989, 0.00784146, 0.00751913, 0.00785116,
0.00794632, 0.0079146 , 0.00798953, 0.01011222, 0.01318719,
0.00721227, 0.00736159, 0.01369175, 0.01087009, 0.00510072,
0.00843218, 0.00451756, 0.00966478, 0.01013771, 0.00715721,
0.00367131, 0.00825834, 0.00832634, 0.01225684, 0.00724481,
0.00670675, 0.00536995, 0.01070637, 0.00937007, 0.00998812]],
dtype=float32)>
Nice!
Our model has output a tensor of prediction probabilities in shape [1, 120], one value for each our dog classes.
Thanks to the softmax function, all of these values are between 0 and 1 and they should all add up to 1 (or close to it).
# Sum the output
np.sum(single_image_output_sequential)
1.0
Beautiful!
Now how do we figure out which of the values our model thinks is most likely?
We take the index of the highest value!
We can find the index of the highest value using tf.argmax() or np.argmax().
We'll get the highest value (not the index) alongside it.
Let's try.
# Find the index with the highest value
highest_value_index_sequential_model_output = np.argmax(single_image_output_sequential)
highest_value_sequential_model_output = np.max(single_image_output_sequential)
print(f"Highest value index: {highest_value_index_sequential_model_output} ({dog_names[highest_value_index_sequential_model_output]})")
print(f"Prediction probability: {highest_value_sequential_model_output}")
Highest value index: 7 (basenji) Prediction probability: 0.023370817303657532
Note: these values may change every time due to the model/data being randomly initalized, don't worry too much about them being different, in machine learning randomness is a good thing.
This prediction probability value is quite low.
With the highest potential value being 1.0, it means the model isn't very confident on its prediction.
Let's check the original label value of our single image.
# Check the original label value
tf.argmax(label_batch[0])
<tf.Tensor: shape=(), dtype=int64, numpy=95>
Oh no! Looks like our model predicted the wrong label (or if it got it right, it was by pure chance).
This is to be expected.
As although our model comes with pretrained parameters from ImageNet, the dense layer we added on the end is initialized with random parameters.
So in essence, our model is randomly guessing what the label should be.
How do we fix this?
We can train the model to adjust its trainable parameters to better suit the data we're working with.
For completeness let's check out the text-based label our model predicted versus the original label.
# Index on class_names with our model's highest prediction probability
sequential_model_predicted_label = class_names[tf.argmax(sequential_model(tf.expand_dims(image_batch[0], axis=0)), axis=1).numpy()[0]]
# Get the truth label
single_image_ground_truth_label = class_names[tf.argmax(label_batch[0])]
# Print predicted and ground truth labels
print(f"Sequential model predicted label: {sequential_model_predicted_label}")
print(f"Ground truth label: {single_image_ground_truth_label}")
Sequential model predicted label: basenji Ground truth label: schipperke
TK - creating a model with the functional API¶
As mentioned before, the Keras Functional API is a way/design pattern for creating more complex models.
It can include multiple different modelling steps.
But it can also be used for simple models.
And it's the way we'll construct our Dog Vision models going forward.
Let's recreate our sequential_model using the Functional API.
We'll follow the same process as mentioned before:
- Define the input layer(s).
- Define the middle/hidden layer(s).
- Define the output layer(s).
- Bonus: Connect the inputs and outputs within an instance of
tf.keras.Model().
# 1. Create input layer
inputs = tf.keras.Input(shape=INPUT_SHAPE)
# 2. Create hidden layer
x = base_model(inputs, training=False)
# 3. Create the output layer
outputs = tf.keras.layers.Dense(units=len(class_names), # one output per class
activation="softmax",
name="output_layer")(x)
# 4. Connect the inputs and outputs together
functional_model = tf.keras.Model(inputs=inputs,
outputs=outputs,
name="functional_model")
# Get a model summary
functional_model.summary()
Model: "functional_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Functional model created!
Let's try it out.
It works in the same fashion as our sequential_model.
# Pass a single image through our functional_model
single_image_output_functional = functional_model(single_image_input)
# Find the index with the highest value
highest_value_index_functional_model_output = np.argmax(single_image_output_functional)
highest_value_functional_model_output = np.max(single_image_output_functional)
highest_value_index_functional_model_output, highest_value_functional_model_output
(69, 0.017855722)
Nice!
Looks like we got a slightly different value to our sequential_model (or they may be the same if randomness wasn't so random).
Why is this?
Because our functional_model was initialized with a random tf.keras.layers.Dense layer as well.
So the outputs of our functional_model are essentially random as well.
Not to fear, we'll fix this soon when we train our model.
Right now we've created our model with a few scattered lines of code.
How about we functionize the model creation so we can repeat it later on?
TK - Functionizing model creation¶
We've created two different kinds of models so far.
Each of which use the same layers.
Except one was with the Keras Sequential API and the other was with the Keras Functional API.
However, it would be quite tedious to rewrite that modelling code every time we wanted to create a new model.
So let's create a function called create_model() to replicate the model creation step with the Functional API.
Note: We're focused on the Functional API since it takes a bit more practice than the Sequential API.
def create_model(include_top: bool = False,
num_classes: int = 1000,
input_shape: tuple[int, int, int] = (224, 224, 3),
include_preprocessing: bool = True,
trainable: bool = False,
dropout: float = 0.2,
model_name: str = "model") -> tf.keras.Model:
"""
Create an EfficientNetV2 B0 feature extractor model with a custom classifier layer.
Args:
include_top (bool, optional): Whether to include the top (classifier) layers of the model.
num_classes (int, optional): Number of output classes for the classifier layer.
input_shape (tuple[int, int, int], optional): Input shape for the model's images (height, width, channels).
include_preprocessing (bool, optional): Whether to include preprocessing layers for image normalization.
trainable (bool, optional): Whether to make the base model trainable.
dropout (float, optional): Dropout rate for the global average pooling layer.
model_name (str, optional): Name for the created model.
Returns:
tf.keras.Model: A TensorFlow Keras model with the specified configuration.
"""
# Create base model
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=include_top,
weights="imagenet",
input_shape=input_shape,
include_preprocessing=include_preprocessing,
pooling="avg" # Can use this instead of adding tf.keras.layers.GlobalPooling2D() to the model
# pooling="max" # Can use this instead of adding tf.keras.layers.MaxPooling2D() to the model
)
# Freeze the base model (if necessary)
base_model.trainable = trainable
# Create input layer
inputs = tf.keras.Input(shape=input_shape, name="input_layer")
# Create model backbone (middle/hidden layers)
x = base_model(inputs, training=trainable)
# x = tf.keras.layers.GlobalAveragePooling2D()(x) # note: you should include pooling here if not using `pooling="avg"`
# x = tf.keras.layers.Dropout(0.2)(x) # optional regularization layer (search "dropout" for more)
# Create output layer (also known as "classifier" layer)
outputs = tf.keras.layers.Dense(units=num_classes,
activation="softmax",
name="output_layer")(x)
# Connect input and output layer
model = tf.keras.Model(inputs=inputs,
outputs=outputs,
name=model_name)
return model
What a beautiful function!
Let's try it out.
# Create a model
model_0 = create_model(num_classes=len(class_names))
model_0.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Woohoo! Looks like it worked!
Now how about we inspect each of the layers and whether they're trainable?
for layer in model_0.layers:
print(layer.name, layer.trainable)
input_layer True efficientnetv2-b0 False output_layer True
Nice, looks like our base_model (efficientnetv2-b0) is frozen (it's not trainable).
And our output_layer is trainable.
This means we'll be reusing the patterns learned in the base_model to feed into our output_layer and then customizing those parameters to suit our own problem.
TK - Model 0 - Train a model on 10% of the training data¶
We've seen our model make a couple of predictions on our data.
And so far it hasn't done so well.
Let's change that.
How?
By training the final layer on our model to be customized to recognizing images of dogs.
We can do so via five steps:
- Creating the model - We've done this ✅.
- Compiling the model - Here's where we'll tell the model how to improve itself and how to measure its performance.
- Fitting the model - Here's where we'll show the model examples of what we'd like it to learn (e.g. the relationship between an image of a dog and its breed).
- Evaluating the model - Once our model is trained on the training data, we can evaluate it on the testing data (data the model has never seen).
- Making a custom prediction - Finally, the best way to test a machine learning model is by seeing how it goes on custom data. This is where we'll try to make a prediction on our own custom images of dogs.
We'll work through each of these over the next few sections.
To begin, let's create a model.
To do so, we can use our create_model() function that we made earlier.
# 1. Create model
model_0 = create_model(num_classes=len(class_names),
model_name="model_0")
model_0.summary()
Model: "model_0"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Model created!
How about we compile it?
TK - Compiling a model¶
After we've created a model, the next step is to compile it.
If creating a model is putting together learning blocks, compiling a model is to getting those learning blocks ready to learn.
We can compile our model_0 using the tf.keras.Model.compile() method.
There are many options we can pass to the compile() method, however, the main ones we'll be focused on are:
- The optimizer - this tells the model how to improve based on the loss value.
- The loss function - this measures how wrong the model is (e.g. how far off are its predictions from the truth, an ideal loss value is 0, meaning the model is perfectly predicting the data).
- The metric(s) - this is a human-readable value that shows how your model is performing, for example, accuracy is often used as an evaluation metric.
These three settings work together to help improve a model.
TK - Which optimizer should I use?¶
An optimizer tells a model how to improve its internal parameters (weights) to hopefully improve a loss value.
In most cases, improving the loss means to minimize it (a loss value is a measure of how wrong your model's predictions are, a perfect model will have a loss value of 0).
It does this through a process called gradient descent.
The gradients needed for gradient descent are calculated through backpropagation, a method that computes the gradient of the loss function with respect to each weight in the model.
Once the gradients have been calculated, the optimizer then tries to update the model weights so that they move in the opposite direction of the gradient (if you go down the gradient of a function, you reduce its value).
If you've never heard of the above processes, that's okay.
TensorFlow implements many of them behind the scenes.
For now, the main takeaway is that neural networks learn in the following fashion:
TK - graphic for learning paradigm
Start with random patterns/weights -> look at data -> try to predict data -> measure performance of predictions (loss function) -> update patterns/weights (optimizer) -> try to predict data -> measure performance (loss function) -> update patterns/weights (optimizer) -> ...
I'll leave the intricasies of gradient descent and backpropagation to your own extra-curricula research.
We're going to focus on using the tools TensorFlow has to offer to implement this process.
As for optimizer functions, there are two main options to get started:
| Optimizer | Code |
|---|---|
| Stochastic Gradient Descent (SGD) | tf.keras.optimizers.SGD() or "sgd" for short. |
| Adam | tf.keras.optimizers.Adam() or "adam" for short. |
Why these two?
Because they're the most often used in practice (you can see this via the number of machine learning papers referencing each one on paperswithcode.com).
There are many more optimizers available in the tf.keras.optimizers module too.
The good thing about using a premade optimizer from tf.keras.optimizers is that they usually come with good starting settings.
One the main ones being the learning_rate value.
The learning_rate is one of the most important hyperparameters to set in a neural network training setup.
It determines how much of a step change the optimizer will adjust your models weights every iteration.
Too low and the model won't learn.
Too high and the model will try to take too big of steps.
By default, TensorFlow sets the learning rate of the Adam optimizer to 0.001 (tf.keras.optimizers.Adam(learning_rate=0.001)) which is a good setting for many problems to get started with.
We can also set this default with the shortcut optimizer="adam".
For more on finding the optimal learning rate, try searching for "finding the optimal learning rate for neural networks".
# Create optimizer (short version)
optimizer = "adam"
# The above line is the same as below
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer
<keras.src.optimizers.adam.Adam at 0x793270189360>
TK - Which loss function should I use?¶
A loss function measures how wrong your model's predictions are.
A model with poor predictions in comparison to the truth data will have a high loss value.
Where as a model with perfect predictions (e.g. it gets every prediction correct) will have a loss value of 0.
Different problems have different loss functions.
Some of the most common ones include:
| Loss Function | Problem Type | Code |
|---|---|---|
| Mean Absolute Error (MAE) | Regression (predicting a number) | tf.keras.losses.MeanAbsoluteError or "mae" for short |
| Mean Squared Error (MSE) | Regression (predicting a number) | tf.keras.losses.MeanSquaredError |
| Binary Cross Entropy (BCE) | Binary classification | tf.keras.losses.BinaryCrossentropy |
| Categorical Cross Entropy | Multi-class classification | tf.keras.losses.CategoricalCrossentropy if your labels are one-hot encoded (e.g. [0, 0, 0, 0, 1, 0...]) ortf.keras.losses.SparseCategoricalCrossentropy if your labels are integers (e.g. [[1], [23], [43], [16]...]) |
In our case, since we're working with multi-class classification (multiple different dog breeds) and our labels are one-hot encoded, we'll be using tf.keras.losses.CategoricalCrossentropy.
We can leave all of the default parameters as they are as well.
However, if we didn't have activation="softmax" in the final layer of our model, we'd have to change from_logits=False to from_logits=True as the softmax activation function does this conversion for us.
There are more loss functions than the ones we've discussed and you can see many of them on paperswithcode.com.
TensorFlow also has many more loss function implementations available in tf.keras.losses.
Let's check out a single sample of our labels to make sure they're one-hot encoded.
# Check that our labels are one-hot encoded
label_batch[0]
<tf.Tensor: shape=(120,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.], dtype=float32)>
Excellent! Looks like our labels are indeed one-hot encoded.
Now let's create our loss function as tf.keras.losses.CategoricalCrossentropy(from_logits=False) or "categorical_crossentropy" for short.
We set from_logits=False (this is the default) because our model uses activation="softmax" in the final layer so it's outputing prediction probabilities rather than logits (without activation="softmax" the outputs of our model would be referred to as logits, I'll leave this for extra-curricula investigation).
# Create our loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) # use from_logits=False if using an activation function in final layer of model (default)
loss
<keras.src.losses.CategoricalCrossentropy at 0x7932c12ebeb0>
TK - Which mertics should I use?¶
The evaluation metric is a human-readable value which is used to see how well your model is performing.
A slightly confusing concept is that the evaluation metric and loss function can be the same equation.
However, the main difference between a loss function and an evaluation metric is that the loss function will typically be differentiable (there are some exceptions to the rule but in most cases, the loss function will be differentiable).
Whereas, the evaluation metric does not have to be differtiable.
In the case of regression (predicting a number), your loss function and evaluation metric could be mean squared error (MSE).
Whereas in the case of classification, your loss function will generally be binary crossentropy (for two classes) or categorical crossentropy (for multiple classes) and your evalaution metric(s) could be accuracy, F1-score, precision and/or recall.
TensorFlow provides many pre-built metrics in the tf.keras.metrics module.
| Evaluation Metric | Problem Type | Code |
|---|---|---|
| Accuracy | Classification | tf.keras.metrics.Accuracy or "accuracy" for short |
| Precision | Classification | tf.keras.metrics.Precision |
| Recall | Classification | tf.keras.metrics.Recall |
| F1 Score | Classification | tf.keras.metrics.F1Score |
| Mean Squared Error (MSE) | Regression | tf.keras.metrics.MeanSquaredError or "mse" for short |
| Mean Absolute Error (MAE) | Regression | tf.keras.metrics.MeanAbsoluteError or "mae" |
| Area Under the ROC Curve (AUC-ROC) | Binary Classification | tf.keras.metrics.AUC with curve='ROC' |
The tf.keras.Model.compile() method expects the metrics parameter input as a list.
Since we're working with a classification problem, let's setup our evaluation metric as accuracy.
# Create list of evaluation metrics
metrics = ["accuracy"]
TK - Learn more on how a model learns¶
We've breifly touched on optimizers, loss functions, gradient descent and backpropagation, the backbone of neural network learning, however, for a more in-depth look at each of these, I'd check out the following:
- 3Blue1Brown's series on Neural Networks - a fantastic 4 part video series on how neural networks are built to how they learn through gradient descent and backpropagation.
- The Little Book of Deep Learning by François Fleuret - a free ~150 page booklet on the ins and outs of deep learning. The notation may be intimidating at first but with practice you will begin to understand it.
TK - Putting it all together and compiling our model¶
Phew!
We've now been through all the main steps in compiling a model:
- Creating the optimizer.
- Creating the loss function.
- Creating the evaluation metrics.
Now let's put everything we've done together and compile our model_0.
First we'll do it with shortcuts (e.g. "accuracy") then we'll do it with specific classes.
# Compile model with shortcuts (faster to write code but less customizable)
model_0.compile(optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"]) # new in TensorFlow 2.14.0 to automatically tune model optimization
# Compile model with classes (will do the same as above)
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
metrics=["accuracy"])
TK - Fitting a model on the data¶
Model created and compiled!
Time to fit it to the data.
This means we're going to pass all of the data we have (dog images and their assigned labels) through our model and ask it to try and learn the relationship between the images and the labels.
Fitting the model is step 3 in our list:
- Creating the model - We've done this ✅.
- Compiling the model - We've done this ✅.
- Fitting the model - Here's where we'll show the model examples of what we'd like it to learn (e.g. the relationship between an image of a dog and its breed).
- Evaluating the model - Once our model is trained on the training data, we can evaluate it on the testing data (data the model has never seen).
- Making a custom prediction - Finally, the best way to test a machine learning model is by seeing how it goes on custom data. This is where we'll try to make a prediction on our own custom images of dogs.
We can fit our model_0 instance with the tf.keras.Model.fit() method.
The main parameters of the fit() method we'll be paying attention to are:
x= What data do you want the model to train on?y= What labels do you want your model to learn the patterns from your data to?batch_size= The number of samples your model will look at per gradient update (e.g. 32 samples at a time before updating its internal patterns).epochs= How many times do you want the model to go through all samples (e.g.epochs=5means looking at all of the data 5 times)?validation_data= What data do you want to evaluate your model's learning on?
There are plenty more options in the TensorFlow/Keras documentation for the fit() method.
However, these options will be more than enough for us.
In our case, let's keep our experiments quick and set the following:
x=train_10_percent_ds- Since we've crafted atf.data.Dataset, ourxandyvalues are combined into one. We'll also start by training on 10% of the data for quicker experimentation (if things work on a smaller subset of the data, we can always increase it).epochs=5- The more epochs you do, the more opportunities your model has to learn patterns, however, it also prolongs training.validation_data=test_ds- We'll evaluate the model's learning on the test dataset (samples its never seen before).
Let's do it!
Time to train our first neural network and bring Dog Vision 🐶👁️ to life!
Note: If you don't have a GPU here, training will likely take a considerably long time. You can activate a GPU in Google Colab by going to Runtime -> Change runtime type -> Hardware accelerator -> GPU. Note that changing a runtime type will mean you will have to restart your runtime and rerun all of the cells above.
# Fit model_0 for 5 epochs
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
epochs=epochs,
validation_data=test_ds)
Epoch 1/5 38/38 [==============================] - 26s 460ms/step - loss: 3.9767 - accuracy: 0.2867 - val_loss: 3.0299 - val_accuracy: 0.5388 Epoch 2/5 38/38 [==============================] - 13s 358ms/step - loss: 2.0423 - accuracy: 0.7933 - val_loss: 1.8357 - val_accuracy: 0.7016 Epoch 3/5 38/38 [==============================] - 13s 358ms/step - loss: 1.0421 - accuracy: 0.8950 - val_loss: 1.2664 - val_accuracy: 0.7653 Epoch 4/5 38/38 [==============================] - 13s 358ms/step - loss: 0.6059 - accuracy: 0.9533 - val_loss: 1.0012 - val_accuracy: 0.8023 Epoch 5/5 38/38 [==============================] - 13s 359ms/step - loss: 0.4053 - accuracy: 0.9717 - val_loss: 0.8705 - val_accuracy: 0.8110
Woah!!!
Looks like our model performed outstandingly well!
Achieving a validation accuracy of ~80% after just 5 epochs of training.
That's far better than the original Stanford Dogs paper results of 22% accuracy.
How?
That's the power of transfer learning (and a series of modern updates to neural network architectures, hardware and training regimes)!
But these are just numbers on a page.
We'll get more in-depth on evaluations shortly.
For now, let's do a recap on the 3 steps we've practiced: create, compile, fit.
TK - Putting it all together: create, compile, fit¶
Let's practice what we've done so far to train our first neural network.
Specifically, we're going to:
- Create a model (using our
create_model()) function. - Compile our model (selecting our optimizer, loss function and evaluation metric).
- Fit our model (get it to figure out the patterns bettwen images and labels).
And later on, we'll get to the other steps of evaluation and making custom predictions.
# 1. Create a model
model_0 = create_model(num_classes=len(dog_names))
# 2. Compile the model
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss="categorical_crossentropy",
metrics=["accuracy"])
# 3. Fit the model
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
epochs=epochs,
validation_data=test_ds)
Epoch 1/5 38/38 [==============================] - 21s 402ms/step - loss: 3.9618 - accuracy: 0.3008 - val_loss: 2.9954 - val_accuracy: 0.5790 Epoch 2/5 38/38 [==============================] - 13s 358ms/step - loss: 2.0118 - accuracy: 0.7958 - val_loss: 1.8101 - val_accuracy: 0.7157 Epoch 3/5 38/38 [==============================] - 13s 359ms/step - loss: 1.0246 - accuracy: 0.9058 - val_loss: 1.2555 - val_accuracy: 0.7717 Epoch 4/5 38/38 [==============================] - 14s 364ms/step - loss: 0.5994 - accuracy: 0.9550 - val_loss: 0.9974 - val_accuracy: 0.7990 Epoch 5/5 38/38 [==============================] - 14s 366ms/step - loss: 0.4009 - accuracy: 0.9717 - val_loss: 0.8619 - val_accuracy: 0.8115
Nice! We just trained our second neural network!
We practice these steps because they will be part of many of your future machine learning workflows.
As an extension, you could create a function called create_and_compile() which does the first two steps in one hit.
Now we've got a trained model, let's get to evaluating it.
TK - Evaluate Model 0 on the test data¶
Alright, the next step in our journey is to evaluate our trained model.
In fact, evaluating a model is just as important as training a model.
There are several ways to evaluate a model:
- Look at the metrics (such as accuracy).
- Plot the loss curves.
- Make predictions on the test set and compare them to the truth labels.
- Make predictions on custom samples (not contained in the training or test sets).
We've done the first one, as these metrics were the outputs of our model training.
Now we're going to focus on the next two.
Plotting loss curves and making predictions on the test set.
We'll get to custom images later on.
So what are loss curves?
Loss curves are a visualization of how your model's loss value performs overtime.
We say loss "curves" because you can have a loss curve for each dataset, training, validation and test.
An ideal loss curve will start high and move towards zero (a perfect model will have a loss value of zero).
How do we get a loss curve?
We could manually plot the loss values output from our model training.
Or we could programmatically get the values thanks to the History object.
This object is returned by the fit method of tf.keras.Model instances.
And we've already got one!
It's saved to history_0 (the model history for model_0).
The History.history attribute contains a record of the training loss values and evaluation metrics for each epoch.
Let's check it out.
# Inspect History.history attribute for model_0
history_0.history
{'loss': [3.9267802238464355,
2.005645990371704,
1.0120075941085815,
0.5953215956687927,
0.3982667922973633],
'accuracy': [0.3008333444595337,
0.79666668176651,
0.9058333039283752,
0.9524999856948853,
0.9708333611488342],
'val_loss': [2.9705166816711426,
1.8010386228561401,
1.2489967346191406,
0.9901936054229736,
0.8602062463760376],
'val_accuracy': [0.5708624720573425,
0.7153846025466919,
0.775291383266449,
0.8039627075195312,
0.8155011534690857]}
Wonderful!
We've got a history of our model training over time.
It looks like everything is moving in the right direction.
Loss is going down whilst accuracy is going up.
How about we adhere to the data explorer's motto and write a function to visualize, visualize, visualize!
We'll call the function plot_model_loss_curves() and it'll take a History.history object as input and then plot loss and accuracy curves using matplotlib.
def plot_model_loss_curves(history: tf.keras.callbacks.History) -> None:
"""Takes a History object and plots loss and accuracy curves."""
# Get the accuracy values
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
# Get the loss values
loss = history.history["loss"]
val_loss = history.history["val_loss"]
# Get the number of epochs
epochs_range = range(len(acc))
# Create accuracy curves plot
plt.figure(figsize=(14, 7))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label="Training Accuracy")
plt.plot(epochs_range, val_acc, label="Validation Accuracy")
plt.legend(loc="lower right")
plt.title("Training and Validation Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
# Create loss curves plot
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label="Training Loss")
plt.plot(epochs_range, val_loss, label="Validation Loss")
plt.legend(loc="upper right")
plt.title("Training and Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
plot_model_loss_curves(history=history_0)
Woohoo! Now those are some nice looking curves.
Our model is doing exactly what we'd like it to do.
The accuracy is moving up while the loss is going down.
You may be wondering why there's a gap between the training and validation loss curves.
Ideally, the two lines would closely follow each other.
In our case, the validation loss doesn't decrease as low as the training loss.
This is known as overfitting, a common problem in machine learning where a model learns the training data very well but doesn't generalize to other unseen data.
You can think of this as a university student memorizing the course materials but failing to apply that knowledge to problems that aren't in the course materials (real-world problems).
The reverse of overfitting is underfitting, which is when a model fails to learn anything useful.
Good news is, our model isn't underfitting (it's performing at ~80% accuracy on unseen data).
I'll leave "ways to fix overfitting" as an extension.
But one of the best ways is to use more data.
And guess what?
We've got plenty more!
As remember, these results were achieved using only 10% of the training data.
Before we train a model with more data, there's another way to quickly evaluate our model on a given dataset.
And that's using the tf.keras.Model.evaluate() method.
How about we try it on our model_0?
We'll save the outputs to a model_0_results variable so we can use them later.
# Evaluate model_0, see: https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
model_0_results = model_0.evaluate(x=test_ds)
model_0_results
269/269 [==============================] - 12s 45ms/step - loss: 0.8602 - accuracy: 0.8155
[0.8602062463760376, 0.8155011534690857]
Beautiful!
Evaluating our model on the test data shows it's performing at ~80% accuracy despite only seeing 10% of the training data.
We can also get the metrics used by our model with the metrics_names attribute.
# Get our model's metrics names
model_0.metrics_names
['loss', 'accuracy']
TK - Model 1 - Train a model on 100% of the training data¶
Time to step it up a notch!
We've trained a model on 10% of the training data (to see if it works and it did!), now let's train a model on 100% of the training data and see what happens.
But before we do...
What do you think will happen?
If our model was able to perform well on only 10% of the data, how do you think it will go on 100% of the data?
These types of questions are good to think about in the world of machine learning.
After all, that's why the machine learner's motto is experiment, experiment, experiment!
Let's follow our three steps from before:
- Create a model (using our
create_model()) function. - Compile our model (selecting our optimizer, loss function and evaluation metric).
- Fit our model (this time on 100% of the data for 5 epochs).
Note: Fitting our model on such a large amount of data will take a long time without a GPU. If you're using Google Colab, you can access a GPU via Runtime -> Change runtime type -> Hardware accelerator -> GPU.
# 1. Create model_1 (the next iteration of model_0)
model_1 = create_model(num_classes=len(class_names),
model_name="model_1")
# 2. Compile model
model_1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss="categorical_crossentropy",
metrics=["accuracy"])
# 3. Fit model
epochs=5
history_1 = model_1.fit(x=train_ds,
epochs=epochs,
validation_data=test_ds)
Epoch 1/5 375/375 [==============================] - 43s 83ms/step - loss: 1.2809 - accuracy: 0.7609 - val_loss: 0.4895 - val_accuracy: 0.8742 Epoch 2/5 375/375 [==============================] - 29s 77ms/step - loss: 0.3678 - accuracy: 0.9014 - val_loss: 0.4135 - val_accuracy: 0.8756 Epoch 3/5 375/375 [==============================] - 29s 77ms/step - loss: 0.2622 - accuracy: 0.9304 - val_loss: 0.3876 - val_accuracy: 0.8767 Epoch 4/5 375/375 [==============================] - 29s 77ms/step - loss: 0.2030 - accuracy: 0.9485 - val_loss: 0.3737 - val_accuracy: 0.8821 Epoch 5/5 375/375 [==============================] - 29s 78ms/step - loss: 0.1607 - accuracy: 0.9613 - val_loss: 0.3680 - val_accuracy: 0.8818
Woah!
Was your intuition correct?
Did what you thought happen?
It looks like all that extra data helped our model quite a bit, it's now performing at close to ~90% accuracy on the test set!
Question: How many epochs should I fit for?
Generally with transfer learning you can get pretty good results quite quickly, however, you may want to look into training for longer (more epochs) as an experiment to see whether your model improves or not. What we've performed is a transfer learning technique called feature extraction, however, you may want to look further into fine-tuning (training the whole model to your own dataset) whole model and using callbacks (functions that take place during model training) such as Early Stopping to prevent the model from training so long its performance begins to degrade.
TK - Evaluate Model 1 on the test data¶
How about we evaluate our model_1?
Let's first by plotting loss curves with the data contained within history_1.
# Plot model_1 loss curves
plot_model_loss_curves(history=history_1)
Hmm, looks like our model performed well, however the validation accuracy and loss seemed to flatten out.
Whereas, the training accuracy and loss seemed to keep improving.
This is a sign of overfitting (model performing much better on the training set than the validation/test set).
However, since our model looks to be performing quite well I'll leave this overfitting problem as a research project for extra-curriculum.
For now, let's evaluate our model on the test dataset using the evaluate() method.
# Evaluate model_1
model_1_results = model_1.evaluate(test_ds)
269/269 [==============================] - 12s 45ms/step - loss: 0.3680 - accuracy: 0.8818
Nice!
Looks like that extra data boosted our models performance from ~80% on the test set to ~90% on test set (note: exact numbers here may vary due to the inherit randomness in machine learning models).
Extension: Putting it all together
As a potential extension, you may want to try practicing putting all of the steps we've been through so far together. As in, loading the data, creating the model, compiling the model, fitting the model and evaluating the model. That's what I've found is one of the best ways to learn ML problems, end to end.
TK - Make and evaluate predictions of the best model¶
UPTOHERE...
# This will output logits (as long as softmax activation isn't in the model)
test_preds = model_1.predict(test_ds)
# Note: If not using activation="softmax" in last layer of model, may need to turn them into prediction probabilities (easier to understand)
# test_preds = tf.keras.activations.softmax(tf.constant(test_preds), axis=-1)
269/269 [==============================] - 7s 21ms/step
test_preds.shape
(8580, 120)
test_preds[0].shape, tf.argmax(test_preds[0])
((120,), <tf.Tensor: shape=(), dtype=int64, numpy=0>)
import numpy as np
test_ds_images = np.concatenate([images for images, labels in test_ds], axis=0)
test_ds_labels = np.concatenate([labels for images, labels in test_ds], axis=0)
test_ds_labels[0], test_ds_images[0]
(0,
array([[[ 43.804947, 44.804947, 38.804947],
[ 39.12483 , 40.12483 , 34.12483 ],
[ 82.701065, 83.701065, 77.62723 ],
...,
[ 21.578135, 25.578135, 24.578135],
[ 19.741274, 23.741274, 22.741274],
[ 15.660867, 19.660868, 18.660868]],
[[ 40.762886, 41.762886, 35.41467 ],
[ 38.87469 , 39.87469 , 33.526478],
[ 84.99161 , 85.99161 , 78.59259 ],
...,
[ 20.462063, 24.462063, 23.462063],
[ 19.207607, 23.207607, 22.207607],
[ 18.408989, 22.408989, 21.408989]],
[[ 37.69817 , 38.69817 , 30.698172],
[ 42.096752, 43.096752, 35.096752],
[ 93.62746 , 94.81206 , 86.258255],
...,
[ 19.530594, 23.530594, 22.530594],
[ 18.091536, 22.091536, 21.091536],
[ 19.202106, 23.202106, 22.202106]],
...,
[[106.28673 , 70.28673 , 96.28673 ],
[105.43164 , 69.43164 , 95.43164 ],
[106.95825 , 70.95825 , 96.95825 ],
...,
[140.48886 , 111.48886 , 129.48886 ],
[135.05005 , 106.05005 , 124.05005 ],
[141.57098 , 112.57099 , 130.57098 ]],
[[105.29309 , 69.29309 , 95.29309 ],
[108.00053 , 72.00053 , 98.00053 ],
[108.30019 , 72.30019 , 98.30019 ],
...,
[140.50948 , 111.50948 , 129.50948 ],
[137.474 , 108.474 , 126.474 ],
[142.99866 , 113.99865 , 131.99866 ]],
[[104.55957 , 68.55957 , 94.55957 ],
[108.76339 , 72.76339 , 98.76339 ],
[108.72768 , 72.72768 , 98.72768 ],
...,
[140.63617 , 111.63617 , 129.63617 ],
[138.14514 , 109.14514 , 127.14514 ],
[141.46515 , 112.46516 , 130.46515 ]]], dtype=float32))
# Choose a random 10 indexes from the test data and compare the values
import random
random_indexes = random.sample(range(len(test_ds_images)), 10)
# TK - this is why we don't shuffle the test data
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
target_index = random_indexes[i]
# Get relevant target image, label, prediction and prediction probabilities
test_image = test_ds_images[target_index]
test_image_truth_label = class_names[test_ds_labels[target_index]]
test_image_pred_probs = test_preds[target_index]
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]
# Plot the image
ax.imshow(test_image.astype("uint8"))
# Create sample title
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""
# Colour the title based on correctness of pred
ax.set_title(title,
color="green" if test_image_truth_label == test_image_pred_class else "red")
ax.axis("off")
TK - Accuracy per class¶
# TK - get accuracy values per class and show how they compare to the original results
# see: http://vision.stanford.edu/aditya86/ImageNetDogs/ -> http://vision.stanford.edu/aditya86/ImageNetDogs/bar_graph_full.png
# Want to compare test_preds + test_labels on a per class basis
# Can I convert both of these into a DataFrame and see what happens?
test_preds_labels = test_preds.argmax(axis=-1)
test_preds_labels
array([ 0, 0, 0, ..., 102, 119, 119])
test_ds_labels
array([ 0, 0, 0, ..., 119, 119, 119], dtype=int32)
test_results_df = pd.DataFrame({"test_pred_label": test_preds_labels,
"test_pred_prob": np.max(test_preds, axis=-1),
"test_pred_class_name": [class_names[test_pred_label] for test_pred_label in test_preds_labels],
"test_truth_label": test_ds_labels,
"test_truth_class_name": [class_names[test_truth_label] for test_truth_label in test_ds_labels]})
test_results_df["correct"] = (test_results_df["test_pred_class_name"] == test_results_df["test_truth_class_name"]).astype(int)
test_results_df
| test_pred_label | test_pred_prob | test_pred_class_name | test_truth_label | test_truth_class_name | correct | |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.981926 | affenpinscher | 0 | affenpinscher | 1 |
| 1 | 0 | 0.747863 | affenpinscher | 0 | affenpinscher | 1 |
| 2 | 0 | 0.995609 | affenpinscher | 0 | affenpinscher | 1 |
| 3 | 44 | 0.467855 | flat_coated_retriever | 0 | affenpinscher | 0 |
| 4 | 0 | 0.997168 | affenpinscher | 0 | affenpinscher | 1 |
| ... | ... | ... | ... | ... | ... | ... |
| 8575 | 119 | 0.785783 | yorkshire_terrier | 119 | yorkshire_terrier | 1 |
| 8576 | 102 | 0.735301 | silky_terrier | 119 | yorkshire_terrier | 0 |
| 8577 | 102 | 0.828518 | silky_terrier | 119 | yorkshire_terrier | 0 |
| 8578 | 119 | 0.940582 | yorkshire_terrier | 119 | yorkshire_terrier | 1 |
| 8579 | 119 | 0.603093 | yorkshire_terrier | 119 | yorkshire_terrier | 1 |
8580 rows × 6 columns
# Calculate accuracy per class
accuracy_per_class = test_results_df.groupby("test_truth_class_name")["correct"].mean()
accuracy_per_class_df = pd.DataFrame(accuracy_per_class).reset_index().sort_values("correct", ascending=False)
accuracy_per_class_df
# pd.DataFrame(accuracy_per_class).sort_values("correct", ascending=False)
| test_truth_class_name | correct | |
|---|---|---|
| 62 | keeshond | 1.000000 |
| 10 | bedlington_terrier | 1.000000 |
| 30 | chow | 0.989583 |
| 92 | saint_bernard | 0.985714 |
| 2 | african_hunting_dog | 0.985507 |
| ... | ... | ... |
| 76 | miniature_poodle | 0.600000 |
| 5 | appenzeller | 0.588235 |
| 104 | staffordshire_bullterrier | 0.581818 |
| 16 | border_collie | 0.560000 |
| 43 | eskimo_dog | 0.440000 |
120 rows × 2 columns
# Let's create a horizontal bar chart to replicate a similar plot to the original Stanford Dogs page
plt.figure(figsize=(10, 17))
plt.barh(y=accuracy_per_class_df["test_truth_class_name"],
width=accuracy_per_class_df["correct"])
plt.xlabel("Accuracy")
plt.ylabel("Class Name")
plt.title("Dog Vision Accuracy per Class")
plt.ylim(-0.5, len(accuracy_per_class_df["test_truth_class_name"]) - 0.5) # Adjust y-axis limits to reduce white space
plt.gca().invert_yaxis() # This will display the first class at the top
plt.tight_layout()
plt.show()
TK - How does this compare to the original results?
TK - Finding the most wrong examples¶
# Get most wrong
top_100_most_wrong = test_results_df[test_results_df["correct"] == 0].sort_values("test_pred_prob", ascending=False)[:100]
top_100_most_wrong
| test_pred_label | test_pred_prob | test_pred_class_name | test_truth_label | test_truth_class_name | correct | |
|---|---|---|---|---|---|---|
| 2727 | 75 | 0.993720 | miniature_pinscher | 38 | doberman | 0 |
| 6884 | 54 | 0.993490 | groenendael | 95 | schipperke | 0 |
| 5480 | 44 | 0.990781 | flat_coated_retriever | 78 | newfoundland | 0 |
| 7630 | 4 | 0.988580 | american_staffordshire_terrier | 104 | staffordshire_bullterrier | 0 |
| 4155 | 55 | 0.986820 | ibizan_hound | 60 | italian_greyhound | 0 |
| ... | ... | ... | ... | ... | ... | ... |
| 2644 | 63 | 0.882950 | kelpie | 37 | dingo | 0 |
| 7934 | 73 | 0.882824 | maltese_dog | 109 | tibetan_terrier | 0 |
| 1059 | 14 | 0.881923 | bloodhound | 12 | black_and_tan_coonhound | 0 |
| 2047 | 86 | 0.879732 | pembroke | 27 | cardigan | 0 |
| 4601 | 15 | 0.878603 | bluetick | 67 | labrador_retriever | 0 |
100 rows × 6 columns
top_100_most_wrong.sample(n=10).index
7804
# Choose a random 10 indexes from the test data and compare the values
import random
random_most_wrong_indexes = top_100_most_wrong.sample(n=10).index
# TK - this is why we don't shuffle the test data
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
target_index = random_most_wrong_indexes[i]
# Get relevant target image, label, prediction and prediction probabilities
test_image = test_ds_images[target_index]
test_image_truth_label = class_names[test_ds_labels[target_index]]
test_image_pred_probs = test_preds[target_index]
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]
# Plot the image
ax.imshow(test_image.astype("uint8"))
# Create sample title
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""
# Colour the title based on correctness of pred
ax.set_title(title,
color="green" if test_image_truth_label == test_image_pred_class else "red",
fontsize=10)
ax.axis("off")
TK - Create a confusion matrix¶
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
fig, ax = plt.subplots(figsize=(25, 25))
confusion_matrix_dog_preds = confusion_matrix(y_true=test_ds_labels,
y_pred=test_preds_labels)
confusion_matrix_display = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix_dog_preds,
display_labels=class_names)
# See: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.plot
ax.set_title("Dog Vision Confusion Matrix")
confusion_matrix_display.plot(xticks_rotation="vertical",
cmap="Blues",
colorbar=False,
ax=ax);
TK - Save and load the best model¶
See here: https://www.tensorflow.org/tutorials/keras/save_and_load#new_high-level_keras_format
TK Note: You may also see the "SavedModel" format as well as ".hdf5" formats...
# Save the model to .keras
model_1.save("dog_vision_model.keras")
# Load the model
loaded_model = tf.keras.models.load_model("dog_vision_model.keras")
# Evaluate the loaded model
loaded_model_results = loaded_model.evaluate(test_ds)
269/269 [==============================] - 10s 26ms/step - loss: 0.3711 - accuracy: 0.8787
assert model_1_results == loaded_model_results
TK - Make predictions on custom images with the best model¶
# TK - load custom image(s)
!wget -nc https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
!unzip dog-photos.zip
--2023-10-26 03:02:51-- https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip Resolving github.com (github.com)... 140.82.121.4 Connecting to github.com (github.com)|140.82.121.4|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip [following] --2023-10-26 03:02:51-- https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1091355 (1.0M) [application/zip] Saving to: ‘dog-photos.zip’ dog-photos.zip 100%[===================>] 1.04M --.-KB/s in 0.02s 2023-10-26 03:02:52 (51.5 MB/s) - ‘dog-photos.zip’ saved [1091355/1091355] Archive: dog-photos.zip inflating: dog-photo-4.jpeg inflating: dog-photo-1.jpeg inflating: dog-photo-2.jpeg inflating: dog-photo-3.jpeg
# View images
custom_image_paths = ["dog-photo-1.jpeg",
"dog-photo-2.jpeg",
"dog-photo-3.jpeg",
"dog-photo-4.jpeg"]
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
ax.imshow(plt.imread(custom_image_paths[i]))
ax.axis("off")
# def plot_10_random_images_from_path_list(path_list: list):
# fig, axes = plt.subplots(2, 5, figsize=(20, 10))
# samples = random.sample(path_list, 10)
# for i, ax in enumerate(axes.flatten()):
# sample_path = samples[i]
# sample_title = sample_path.parent.stem
# ax.imshow(plt.imread(sample_path))
# ax.set_title(sample_title)
# ax.axis("off")
# This will error...
loaded_model.predict("dog-photo-1.jpeg")
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-97-bf77d5e16d11> in <cell line: 2>() 1 # This will error... ----> 2 loaded_model.predict("dog-photo-1.jpeg") /usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs) 68 # To get the full stack trace, call: 69 # `tf.debugging.disable_traceback_filtering()` ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb /usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor_shape.py in __getitem__(self, key) 957 else: 958 if self._v2_behavior: --> 959 return self._dims[key] 960 else: 961 return self.dims[key] IndexError: tuple index out of range
# Model needs to make predictions on images in same format it was trained on
# Load the image (into PIL format)
custom_image = tf.keras.utils.load_img(
path="dog-photo-1.jpeg",
color_mode="rgb",
target_size=(img_size, img_size),
)
custom_image
# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
custom_image_tensor.shape
(224, 224, 3)
loaded_model.predict(custom_image_tensor)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-101-f7a0c54105c0> in <cell line: 1>() ----> 1 loaded_model.predict(custom_image_tensor) /usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs) 68 # To get the full stack trace, call: 69 # `tf.debugging.disable_traceback_filtering()` ---> 70 raise e.with_traceback(filtered_tb) from None 71 finally: 72 del filtered_tb /usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__predict_function(iterator) 13 try: 14 do_return = True ---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope) 16 except: 17 do_return = False ValueError: in user code: File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2416, in predict_function * return step_function(self, iterator) File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2401, in step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2389, in run_step ** outputs = model.predict_step(data) File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2357, in predict_step return self(x, training=False) File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility raise ValueError( ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(32, 224, 3)
pred_probs = loaded_model.predict(tf.expand_dims(custom_image_tensor, axis=0))
# pred_probs = tf.keras.activations.softmax(tf.constant(pred_probs)) # if you have no activation="softmax" in your model
class_names[tf.argmax(pred_probs, axis=-1).numpy()[0]]
1/1 [==============================] - 2s 2s/step
'labrador_retriever'
tf.expand_dims(custom_image_tensor, axis=0).shape
TensorShape([1, 224, 224, 3])
Note: TK - In the case of some models you may need to rescale your values here, in our case the Rescaling layer is built-in to the model.
def pred_on_custom_image(image_path,
model,
target_size=224,
class_names=class_names,
plot=True):
# Prepare and load image
custom_image = tf.keras.utils.load_img(
path=image_path,
color_mode="rgb",
target_size=(target_size, target_size),
)
# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
# Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)
# Make a prediction with the target model
pred_probs = model.predict(custom_image_tensor)
# pred_probs = tf.keras.activations.softmax(tf.constant(pred_probs))
pred_class = class_names[tf.argmax(pred_probs, axis=-1).numpy()[0]]
# Plot if we want
if not plot:
return pred_class
else:
plt.figure(figsize=(5, 3))
plt.imshow(plt.imread(image_path))
plt.title(pred_class)
plt.axis("off")
pred_on_custom_image(image_path="dog-photo-2.jpeg", model=loaded_model)
# Predict on multiple images
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
image_path = custom_image_paths[i]
pred_class = pred_on_custom_image(image_path=image_path,
model=loaded_model,
plot=False)
ax.imshow(plt.imread(image_path))
ax.set_title(pred_class)
ax.axis("off")
TK - Extensions & Exercises¶
- Create a machine learning app with Gradio to predict on images of dogs - https://www.gradio.app/
- Try a prediction on your own images of dogs and see if the model is correct
- Freeze the base weights (we could update the base weights, in a process known as fine-tuning, tk - see TensorFlow course)
- Try another model from
tf.keras.applications, e.g. ConvNeXt - Train a model on your own custom set of image classes, for example, apple vs banana vs orange
- More callbacks -
- Regularization techniques - data augmentation, dropout, etc
- Other models - see tf.keras.applications or Kaggle Models
- ZTM TensorFlow course -
- See further fine-tuning here
- See videos on my YouTube for a more comprehensive TensorFlow overview to get started
FAQ¶
- TK - how do I know which model to pick? (experiment, experiment, experiment!)
Takeaways¶
- Neural networks are powerful machine learning models and TensorFlow helps you build them
- For most new problems, you should generally look to see if a pretrained model exists and see if you can adapt it to your use case
- Ask yourself:
- what format is my data in? What are my ideal inputs and outputs?
- is there a pretrained model for my use case?
Extra curriculum¶
- Read about
tf.data- https://www.tensorflow.org/guide/data - Read about
tf.dataperformance best practices
TK - Try data augmentation¶
See: https://www.tensorflow.org/tutorials/images/data_augmentation
from tensorflow.keras import layers
data_augmentation = tf.keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(factor=0.2),
layers.RandomZoom(
height_factor=0.2, width_factor=0.2
),
],
name="data_augmentation"
)
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=False,
weights='imagenet',
input_shape=(img_size, img_size, 3),
include_preprocessing=True
)
# base_model.summary()
# Freeze the base model
base_model.trainable = False
# TK - functionize this
# Create new model
inputs = tf.keras.Input(shape=(224, 224, 3))
# TK - Create data augmentation
x = data_augmentation(inputs)
# Craft model
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(num_classes,
name="output_layer",
activation="softmax")(x) # Note: If you have "softmax" activation, use from_logits=False in loss function
model_2 = tf.keras.Model(inputs, outputs, name="model_2")
model_2.summary()